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We present a novel quantum tomographic reconstruction method based on Bayesian inference via the Kalman 
filter update equations. The method not only yields the maximum likelihood/optimal Bayesian reconstruction, 
but also a covariance matrix expressing the measurement uncertainties in a complete way. From this covariance 
matrix the error bars on any derived quantity can be easily calculated. This is a first step towards the broader 
goal of devising an omnibus reconstruction method that could be adapted to any tomographic setup with little 
effort and that treats measurement uncertainties in a statistically well-founded way. 

In this first part we restrict ourselves to the important subclass of tomography based on measurements with 
discrete outcomes (as opposed to continuous ones), and we also ignore any measurement imperfections (dark 
counts, less than unit detector efficiency, etc.), which will be treated in a follow-up paper. We illustrate our 
general theory on real tomography experiments of quantum optical information processing elements. 



PACS numbers: 03.65.Wj,06.20.Dk,42.50.-p 
I. INTRODUCTION 

Since the first proposal of optical quantum tomography by 
Vogel and Risken (|T|], and the first practical tomographic ex- 
periments by Smithey et al f2], quantum tomography has gone 
a long way, and is now being used in a variety of physical 
setups, not restricted to optical systems, and many improve- 
ments have been made to the original reconstruction methods 
fi, 4]. While this is certainly a desirable evolution, it must 
also be said that on the negative side this resulted in a pro- 
liferation of tomography methods. While it is unavoidable 
that every physical system has its own peculiarities, and each 
particular setup calls for its own tomographic measurement 
method, it is not conceivable that for every type of system and 
for every tomography method there should also be a differ- 
ent tomographic reconstruction method. Furthermore, when 
each time the reconstruction software is written from scratch, 
that does not benefit reliability. After 20 years of tomograph- 
ical experience it is not unreasonable to ask for a unification 
of these methods, taking the best out of each and devising a 
small set of "omnibus" reconstruction methods, that only need 
some small adaptations to the particular tomographic setup. 

An even more important remark concerning reconstruction 
methods is the fact that error bars are seldom given. Measure- 
ments are worthless without error bars. When tomography is 
just a measurement, even though a complicated one, why then 
treat tomography differently? Error bars are dearly needed 
here as well, since the whole purpose of tomography is to 
come up with a description of the quantum state that is suf- 
ficient to derive further properties, and for these properties er- 
ror bars would certainly be needed. As Rehacek, Mogilevtsev 
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and Hradil jsj] stated: "The quantification of relevant statisti- 
cal errors is an indispensable but often neglected part of any 
tomographic scheme used for quantum diagnostic purposes." 

Some theoretical papers mention error bars, but they are 
calculated from simulated data, using Monte-Carlo methods 
and are only meant to give an indication about how good the 
method performs. As measurement errors depend on the ac- 
tual system and its state, this is clearly not enough. What we 
are after is error bars produced straight from the experimental 
data and the underlying statistical model. 

A widely used error criterion is the state fidelity (for quan- 
tum states) or the process fidelity (for quantum processes), 
which compares the reconstructed state to a predefined "de- 
sired state" (e.g. [6, 7, 8]). While it is easy to use, it clearly 
gives no indication about the reconstruction alone but is the 
sum of reconstruction and construction errors. As such, it 
cannot answer the following two questions separately: "Are 
we seeing the correct state?", and "Are we seeing the state 
correctly?" Furthermore, as pointed out in |5], fidelity is just 
a single number and one cannot expect it to describe the com- 
plete error structure of the reconstruction. 

One could argue, however, that error bars are not explic- 
itly needed if one just subsumes all measurement noise into 
the estimated quantum state via statistical mixing. We dis- 
agree with this point of view, and we claim that this throws 
away useful information. There is a difference between, on 
one hand, preparing a pure state and assuming it is mixed be- 
cause the measurements do not allow to conclude otherwise, 
and, on the other hand, not being able to prepare a pure state, 
obtaining a mixed state instead, and knowing that state per- 
fectly. In both cases the outcome is the same, a mixed state, 
but in the former case the real state is actually pure. 

To make sense of the concept of error bars in the context 
of state (process) estimation, we have to clearly distinguish 
between the roles of the state preparer and the observer mea- 
suring the state, both of which involve noise. Tomography is 
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based on the assumption that every time the observer measures 
the state, it is actually the same state. Because of measure- 
ment noise, the observer cannot obtain perfect knowledge of 
which state has been prepared in finite time. However, noth- 
ing prevents him from doing so in principle. Measuring for 
an infinite time, using a sufficient set of measurements, will 
yield perfect knowledge. If the same state is being prepared, 
it is ultimately knowable. On the other hand, the preparation 
of the state will also involve noise. Every time the preparer 
attempts to prepare the nominal state, noise will affect this 
and some slightly other state will result. This kind of noise is 
impossible to overcome, not even in principle. The only way 
one can deal with it is by absorbing the preparation noise into 
the quantum state that is being prepared, through statistical 
mixing. If the preparation noise is ergodic, the observer will 
recover the average state of the quantum state ensemble. 

In short: the individual states are not knowable, but their av- 
erage is; and if the number of measurements is finite, the ob- 
server will not obtain this average preparation state perfectly. 
Then error bars, or more precisely a density function over state 
space, are needed to express this lack of complete knowledge. 

One of the first papers calculating error bars from the mea- 
surement data is Ref. |9], for a specific reconstruction method 
of optical homo dyning tomography (OHT) using so-called 
pattern functions lUOl [Till . Using this method, the density ma- 
trix p can be derived directly from the detection probabilities 
pr(a;, 9) sampled over phase space, where x and 6 are parame- 
ters representing the settings of the OHT apparatus. To recon- 
struct p from the measurement data, these probabilities are re- 
placed by the relative frequencies /(x, 0)/N . To obtain error 
bars, the fluctuations on the detection frequencies are mod- 
eled by a Poisson process, by which the variance cr^ equals 
the mean pr divided by the number of runs N . The first draw- 
back of this method is that only the marginal variances of the 
density matrix elements are treated here, disregarding corre- 
lations between errors, and therefore overestimating them. A 
second drawback of this approach is that, due to pr not be- 
ing known, it is approximated by the relative frequency, giv- 
ing cr^ = pr/A^ w f /N"^, where / is the absolute detection 
frequency. This approximation actually underestimates the 
variance, especially for low-probability events. Indeed, for 
events with f — this approach assigns zero probability to 
the event, with zero variance, which corresponds to an abso- 
lute certainty. That certainty is not warranted given that only 
a finite number N of experiments were done. This problem is 
known more generally as the "zero-eigenvalue problem" and 
occurs in different guises in many other reconstruction meth- 
ods. 

The present work is a first step towards overcoming the two 
deficiencies described above: we propose to use Bayesian in- 
ference as the unifying principle to calculate a probability den- 
sity function over state space, from the measurement data, and 
from that density derive the first and second order moments: 
the mean state and the state covariance matrix. The goal of 
the present paper is to outline a practical method for calcu- 
lating this mean state and state covariance directly from a set 
of tomographic data, in a completely general and statistically 



well-founded way. 

During the course of this work, the paper [5] appeared, in 
which the same goals were aimed for and a method quite sim- 
ilar in spirit to ours was proposed. We refer the reader to Sec- 
tion |VT] for a discussion of the main differences between our 
method and the one in Jsl] . 

The input required by our method is a statistical model of 
the quantum tomographic setup. Given a state and the mea- 
surement settings, how do the statistical properties of the mea- 
surement outcomes depend on that? For a measurement with 
a finite number K of outcomes, a measurement setting cor- 
responds to a choice of operator elements A^'^', one element 
for each outcome. When the outcome is a continuous variable 
(e.g. position x), the measurement operator is parameterised 
by the continuous outcome value x. In either case, the laws of 
quantum mechanics dictate that the outcome k (or x) occurs in 
an experiment with a probability (or probability density) given 
by Born's rule pk = Ti[pA'-'^'>] (or p{x)dx — Tr[pA{x)]dx). 
The measurements taken in a tomographic experiment relate 
to this probability density, in one way or another. For any 
given setting a number of runs N would typically be per- 
formed, and in case of a finite outcome experiment, the fre- 
quencies fk of the various outcomes would be recorded, or 
the values of the measurement in case of a continuous out- 
come. The relation between these frequencies and the under- 
lying probability distribution is governed by the laws of statis- 
tics. 

Alternatively, in some experiments the outcome could be 
the combined effect of many small measurement events. For 
example, in tomography of atomic/molecular clouds, the flu- 
orescence response of the cloud to an impinging probe beam 
could be observed, in which case the experimental outcome is 
a fluorescence spectrum It 12l fisi [Till . The recorded spectrum 
is a random variate with expected value proportional to the rel- 
evant probability density (a marginal of the quasi-probability 
density describing the state) and variance depending on the 
signal-to-noise ratio. Another example is an optical homo- 
dyning tomography (OHT) experiment where the probe beam 
intensity is so high that individual photons impinging on the 
detectors can no longer be resolved and the measurement re- 
sults are light intensities, measured as voltages. 

Once the statistical model of the tomographic setup has 
been supplied in suitable form, a general-purpose Bayesian 
inference engine could in principle take it from there, convert- 
ing the measurement data into a posterior probability density 
over quantum state space. However, the actual calculations 
are typically too demanding to be at all practical. The second 
ingredient of our proposed solution is to make the calculations 
involved in Bayesian inference feasible by approximating the 
statistical tomographic model by a so-called linear-Gaussian 
model (explained below). For such models, the Bayesian in- 
ference simplifies to a set of simple equations known as the 
Kalman filter update equations. 

Kalman filtering is a technique for dynamical state estima- 
tion that allows to estimate a dynamical state from a sequence 
of noisy data ifisll . Kalman filtering has already been applied 
in the context of continuous quantum measurement and quan- 
tum control lfT6l [TtI [TsJ . For tomographic reconstruction, ap- 
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plying Kalman filtering seems to be a novel idea. 

The goals we have set out for this work are quite challeng- 
ing. Rather than trying to solve all the problems involved in 
one go, we focus here on a particular, but important, class of 
tomography experiments, namely those where the measure- 
ments have (few) discrete outcomes, as opposed to measure- 
ments of continuous variables. This class still covers a wide 
variety of quantum systems, including single-photon optical 
systems (e.g. optical quantum confuting) ialiSllfllSlllj spin 
systems based on ions ff], atoms (23\, or electrons (spin echo 
tomography) ||23ll , superconducting |8] and solid state qubits 
ll24ll . and tomography of atomic and molecular states based 
on fluorescence spectra H2l [Tsl [l4 1 . For the purposes of ex- 
position, we will restrict our attention here and illustrate our 
reconstruction method for optical systems. Important exam- 
ples of tomography experiments not falling in this class are 
optical homodyning and heterodyning iH |j, [Till , where the 
outcome is the continuous variable x. We leave this for future 
work. 

The second restriction we have imposed here is that we as- 
sume that the apparatus performing the tomography is ideal. 
In reality any measurement exhibits imperfections; e.g. pho- 
ton detectors have less than unit efficiency and may exhibit 
dark counts and input states for process tomography may be 
imperfectly prepared. Although these imperfections pose no 
deep fundamental problems, they would obscure the presenta- 
tion which is why we have chosen to treat them in a follow-up 
paper |25]. In the present paper we will to cover the main 
principles of our proposal and illustrate them using a simple 
real application (based on actual data) as proof of principle. 

A third (minor) restriction is that the number of experimen- 
tal runs per measurement setting is not too small, so that the 
distributions governing the measurement outcomes can be ap- 
proximated by normal distributions without too great an error. 

It goes without saying that our reconstruction method is 
suitable both for state tomography and process tomography, 
because state tomography lies at the basis of process tomog- 
raphy. Either one sends various input states through the quan- 
tum process and measures the output states (as in Ref. |6]), 
or one half of a maximally entangled state is sent through the 
process and the global output state (including the other half) is 
measured (as in Ref. |20]). Both methods are formally equiv- 
alent with state tomography of the state representative of the 
process, under the Choi-Jamiolkowski isomorphism. The pre- 
sentation of our method will therefore be mainly state based, 
for definiteness and simplicity. 

As we bring together a number of concepts from statistics, 
engineering mathematics and quantum mechanics, we begin 
our presentation with a rather lengthy section (Sec. |ll]i on 
background material, with an extensive list of notations, and 
brief overviews of Bayesian inference and Kalman filtering. 
In Sec. [Ill] we present the basic theory of our proposal, and 
show how the problem of tomographic reconstruction can be 
made to fit the mould of Kalman filtering, thereby propos- 
ing solutions to several problems that we encounter along the 
way. The theory is then illustrated on two real tomography 
applications in Secs. lIVI andlVl based on actual experimental 
tomography data. Finally, in Sec. |Vl] we highlight the main 



benefits of our method, its performance and the costs associ- 
ated with it, and compare it to existing methods, in particular 
to its sister-method proposed in [5]. 

II. THEORETICAL BACKGROUND 
A. Notations 

Let us start with introducing the main notations and ty- 
pographical conventions that we will use in the paper, along 
with some fundamental notions from applied probability the- 
ory. We denote vectors and matrices by symbols in boldface, 
F, /, to distinguish them from scalar quantities which we de- 
note in roman, including vector and matrix components, fi. 
Exceptions to this convention are quantum-mechanical quan- 
tities like states p, POVM elements 11 and maps $. The vector 
whose entries are all 1 is denoted by 1 :~ (1, . . . , 1), and the 
identity matrix by 1 . 

We adopt the usual convention from the statistics literature 
to denote random variables with capital letters, F, and the val- 
ues they can take with lowercase, /. For example, the prob- 
ability density function (PDF) of a random variable F is de- 
noted ./f(/)- The first / is the general symbol for a PDF, the 
second F is the random variable, and the third / is the argu- 
ment of the PDF and represents the values the random variable 
F can take. The mean and variance of a scalar random vari- 
able X are denoted by Ijl{X), a^{X), and the mean and the 
covariance matrix of a d-dimensional random variable X by 
Pl{X) andS;(X). 

In this paper, a number of distributions are prominent. Here 
we recall definitions and notations about the multinomial and 
normal distributions. Other distributions (chi-squared, Dirich- 
let and beta) will be described in subsequent sections. 

When a random d-dimensional variable F is distributed 
according to a multinomial distribution with parameters 

(where X]fc=i /fc ~ ^^'^ P ^^^^ is denoted by ~ 
Mtn(A^; p). The PDF of this distribution is given by 

/f(/)= (1) 

Here we denote the multinomial coefficient by 

fN\ _ N\ 

The mean of the multinomial distribution is given by 

/Zfc = Npk, 
and the entries of its covariance matrix are 




Npk{l-Pk). k^l 
-NpkPi, k ^ I. 



When a random variable X is distributed according to a 
univariate normal distribution with mean /i and variance 
we write this as X ~ A/'(/i, cr^). Similarly, for a multivariate 
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normal distribution with mean fi and covariance matrix S we 
write X ~ 7V(/i, S). 

We will reserve the symbol <j> for the PDF of a normal dis- 
tribution, while using / for general PDFs. The PDF of the 
univariate normal distribution wiU thus be denoted by 



1 



exp(. 



2a2 



Similarly, we will denote the PDF of an n-dimensional multi- 
variate normal distribution by 

^ ^ ^(27r)«|detS| 

The quadratic form appearing in the exponent is a proper dis- 
tance measure between the vectors x and fi and is called the 
squared Mahalanobis distance, which we denote by A^^: 



(3) 



One of the main statistical tools used in this paper is the 
approximation of distributions by normal distributions, using 
the technique of moment matching, whereby a distribution 
is approximated by a normal distribution with the same first 
and second order moments as the original. While conceptu- 
ally simple and easy to use in practice, this method is also 
statistically well-founded because it gives the approximation 
that minimises the Kullback-Leibler distance DKL{p\\q) := 
J dx p{x) log{p{x) / q{x)) between a given PDF p and the 
approximating normal PDF q. 

On the matrix analysis side, we will follow mathematical 
convention (and not the physical one) of denoting Hermitian 
conjugates with an asterisk A* instead of a dagger, and re- 
serve the dagger for the Moore-Penrose (MP) inverse: A'' . 
Complex conjugation is denoted by an overline: A. 

We will have the occasion to apply the following formula 
for the matrix inverse of a rank-A: correction to a non-singular 
matrix: 

{A + ucv*y^ 

= A-^ - A-^U {C-^ + V*A-^Uy^V*A-^ (4) 

Here, A is nxn and non-singular, C is fc x fc and non-singular, 
and U and V are general nx k matrices. This formula is alter- 
nately known as the Matrix inversion lemma, or the Woodbury 
matrix identity li26ll . 

While the main topic of this paper is the tomographic recon- 
struction of quantum states, maps and POVMs, objects that 
are typically represented by matrices (density matrices, Choi 
matrices, POVM elements), the reconstruction technique we 
use is based on vector representations of the state of the sys- 
tem. Therefore, more often than not, we will need to con- 
vert the usual matrix representation of the quantum objects 
to vector representation. For quantum states that means we 
will be employing a Hilbert space representation. The space 
M(i(C) of d X d matrices will be considered as a Hilbert 
space Hd equipped with the Hilbert-Schmidt inner product 



(^4, B) = Tr AB*. To distinguish between the two represen- 
tations, we write p for a density matrix, and p for its Hilbert 
space representative. 

While many different bases could be used for H, by far 
the easiest one for the purposes of this paper is the basis of 
matrix units {e^^}fj^i; in quantum physical notation e*^ = 
Converting a density matrix p to its Hilbert space rep- 
resentative p amounts to the so-called Vec operation, which 
works by just stacking the columns of the density matrix into 
a single -dimensional column vector: p — Vcc{p) := 

E 



The reverse operation is the Mat operation, 
which reshapes a d^-dimensional vector into a d x d matrix. 
The vector Vec 1 is denoted 1 1). That is, 1 1) = EiLi N) N)- 

In the same vein, the Hilbert representation of a linear map 
C (be it completely positive (CP) or not) acting on d x d 
density matrices, expressed in the basis of matrix units, is a 
d^ X matrix whose columns are the Hilbert space represen- 
tations of the matrices £(e*^). More specifically, if the map 
is a CP map and has the Kraus representation p ^ C{p) — 
^^j^ A^^^ pA^^^'* , then the Hilbert space representation of 
C is the matrix L given by 



B. The Dirichlet Distribution 

The Dirichlet distribution is the higher-dimensional gener- 
alisation of the beta distribution. The importance of this distri- 
bution stems from the fact that it is the conjugate distribution 
of the multinomial distribution. That is, \f F ^ Mtn(A^, p) is 
the distribution of F conditional on P = p, then Bayesian in- 
version yields that P conditional on F = / is Dirichlet with 
parameter /. Formally, the two distributions only differ by 
their normalisation. The multinomial is normalised by sum- 
ming over all integer non-negative / summing up to N, while 
the Dirichlet is normalised by integrating over the simplex of 
non-negative p summing to 1 . 

The general form of the PDF of a d-dimensional Dirichlet 
distribution with parameters ai is ( iIztIi . Chapter 49) 



/p(p) = r(ao)n 



where ao is defined as 



ao := ^ ai 

i=l 



(5) 



The range of P is the simplex pi > O. J^Pi ~ 1- I" our case 
ai-l = fi, and as J2^ = the PDF is 



iN + d-iy.f[Pt. 
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The mean values of the Dirichlet distribution are 

Oil 



and the elements of its covariance matrix are 

ai [aQ — ai) ■ ■ 



(6) 



— ajaj ■ I ■ 

3^g(ao + l)' * 



(/. + l)(jV+d-/,-l) . _ . 
(Ar+(i)^(Af+d+l) ' ' ~ ■> 

(7V+d)2(iV+d+l) ' ' ~r J 



(7) 



For further reference, we note a few properties of the co- 
variance matrix S of the Dirichlet distribution. First of all, S 
is singular; it has a zero eigenvalue pertaining to the eigenvec- 
tor 1 (1, .... 1). As the inverse of S does not exist we will 
need its Moore-Penrose inverse Because S can be written 
as a diagonal matrix minus a symmetric rank- 1 matrix, 

(TV + d) Diag(/ + !)-(/ + !)(/ + 1)* 

its MP-inverse can be calculated analytically. 

For non-singular differences of a matrix D and a rank 1 
matrix xx* , the matrix inversion lemma (HI provides the for- 
mula 



[D — XX 



D ' + (1 - x*D-^x)-^ D^xx*D^ 



Even for invertible D the difference can still be singular when 
x*D^x = 1. In that case the MP-inverse is given by 

{D-xxy = GD^G 

G = I - {x*D-^x)-'^ D-^xx*D-\ 

Here, G is an orthogonal projector on the support of I? — xx*. 
This gives in our case 

St = {N + d){N + d+l)GDiag{f + 1)-^G 
G = 1 - d-^1 1*. (8) 

Thus G is the projector on the subspace of vectors x for which 
Xi — 0. In other words, on the subspace of differences of 
probability vectors, reduces to the diagonal matrix 

{N + d){N + d+l) Diag(/ + 1)-^ 



C. Bayesian Inference 

Our reconstruction procedure essentially amounts to per- 
forming Bayesian inference, in conjunction with an approx- 
imation scheme for the statistical properties of the measure- 
ment process. More precisely, the measurement process is ap- 
proximated by a so-called linear-Gaussian model. Within the 
confines of this model, the actual calculations for perform- 
ing the Bayesian inference turn out to be very simple and are 



given by the update equations of a Kalman filter; this will be 
explained below. In this section we briefly describe the ele- 
ments of Bayesian inference. For an in-depth treatment we 
refer to the excellent introductory work |28]. 

We describe the state of the system under investigation by 
a vector and denote it by x. For the time being, we ignore the 
fact that the system is a quantum system. As our knowledge 
of X is obtained from (quantum) measurements and is statis- 
tical in nature, we describe it by a random variable, X. Since 
in our setup measurements are independent (each measure- 
ment operates on a private copy of the quantum state under in- 
vestigation), the reconstruction procedure can be decomposed 
as an iterative scheme in which each measurement is incor- 
porated sequentially. We assume that in each iteration any 
prior knowledge about X, as well as any knowledge obtained 
through the outcomes of the first m — 1 measurement set- 
tings has been incorporated into the probability density func- 
tion fx- In Bayesian terminology the PDF of X is called 
the prior PDF. The goal of the inference procedure is to come 
up with a posterior PDF that incorporates the knowledge ob- 
tained by the measurement outcomes in the m-th measure- 
ment setting. We use the random variable X' to describe the 
updated knowledge; its PDF fx' is called the posterior PDF. 

We denote the "knowledge obtained through a measure- 
ment" by a vector z describing the measurement outcome. 
This vector 2 is a sample of the corresponding random vari- 
able Z. It can give us additional information about the system 
via the statistical relation between Z and X, which we ex- 
press as the conditional PDF fz\x- This conditional PDF is 
the statistical description of the measurement model linking 
X to Z; it will be specified further in section HIlB I The pos- 
terior fx' is then nothing but the conditional PDF fx\z- 

At the heart of any Bayesian inference procedure lies 
Bayes' rule, which expresses the relation between fx\z ^nd 
fz\x' 

, , I . fz\xiz\x)fx{x) 
fx\z{x\z) = —- . 

While fz is defined as the marginal of fx,z = fz\xfx, it is 
much more convenient to interpret fz as a normalisation con- 
stant, ensuring that fx\z integrates to 1 over the probability 
space of X. Second, as the main random variable in Bayes' 
rule is X, while z plays the role as parameter and is given by 
the observation, we have to consider fz\x ^s a function of x 
too. As a function of its second argument, fz\x is no longer 
a conditional probability density but a likelihood function. We 
will denote this function by Lx\z- Because of the explicit 
normalisation in Bayes' rule, L is defined up to proportional- 
ity. Note the reversal of the arguments, which is customary in 
the Bayesian literature and reflects the change of focus from 
Z, being the measurement outcome causally related to the un- 
derlying state X, to X, our guess of what the state should be, 
given the measurement outcome Z = z as evidence: 



Lx\z{x\z) oc fz\x{z\x). 
We will henceforth rewrite Bayes' rule as 
fx\z = C Lxizfx- 



(9) 
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The Bayesian inference step, incorporating z as new knowl- 
edge, is then expressed as 

U\^)^CLx\z{A^)!x{^)- (10) 

For a sequence of measurement settings (Zi)"^j^, this step 
has to be iterated n times, leading through a sequence of n + 1 
PDFs that describe the state X in a way that is consistent with 
the additional knowledge obtained through the measurements. 
If we describe the prior information by the PDF of Xq, the i- 
th measurement by the likelihood function with parameter Zi, 
and the updated information after i iterations by the PDF of 
Xi, we get 

/x„(a;) = CLx„_i|z„(a;Nn)-^x„_2|z„-i(^N«-2)--- 
ixo|z,(^Ni)/x„(a;). (11) 

As one can see, this is merely a product of the n likelihood 
functions and the prior PDF, and can therefore be calculated 
in any order. This will turn out to be important later on. 

Despite the apparent simplicity of the Bayesian update for- 
mula, actual calculations based on it are in general very com- 
plicated because the variable X appears both as main variable 
of a continuous PDF (the prior) and as continuous parameter 
of the (discrete) likelihood functions. The resulting product is 
in general an extremely complicated continuous function of x. 
In the context of reconstruction of tomographic data, for ex- 
ample, the likelihood functions are polynomials of very high 
degree. 

The calculations do become feasible in the specific case 
of so-called Linear-Gaussian Models. In such models the 
dynamics of the system can be described by a time-discrete 
Markov chain with a linear evolution operator perturbed by 
zero-mean Gaussian noise. Similarly, the measurement also 
depends linearly on the system state and any perturbation must 
also be zero-mean Gaussian noise. In other words, all vari- 
ables (Xq, and all Zi) are normally distributed, and the pa- 
rameter X enters only in the value of the mean of /z;|Xi_i; 
moreover, it does so in a linear way only. A typical example 
is a classical measurement whose output depends linearly on 
the state and is perturbed by additive white Gaussian noise. 

For these models, and for more complicated dynamical 
models where the state X varies over time, the Bayesian up- 
date formula reduces to a set of simple equations called the 
Kalman filter equations. A Kalman filter is the optimal state 
estimator when the system and the measurement can be mod- 
elled by linear-Gaussian models. The Kalman filter equations 
consist of two sets of equation; one set (the predictor equa- 
tions) predicts the evolution of the Markov chain, while the 
other set (the update equations) updates the state of the sys- 
tem based on the measurements taken. For the purposes of 
the present paper only the update equations will be used be- 
cause the basic assumption of tomography is that the system 
is static. A very good account of Kalman filtering is given in 
Ref. S. 

The reason for the feasibility of the linear-Gaussian model 
is that all distributions occurring in the calculations are nor- 
mal, including the intermediate products of the factors of ( fTTT i. 
This will be explained in the next section, where we describe 
the Kalman update equations in detail. 



D. Kalman Filtering for Static Systems 

Let us return again to the Bayesian update formula (fTOl l 

fx'ix)^CLx\zix\z)fxix). 

A linear-Gaussian model can be represented pictorially as fol- 
lows; 

X Ji^Y +^'")®^ z 

For the purposes of this paper it turns out to be beneficial to 
split the model into two parts: a linear mapping, represented 
by a matrix H, followed by the noise process, which consists 
simply of adding zero-mean Gaussian white noise with given 
covariance matrix 0. A further model assumption is that the 
prior PDF fx is Gaussian, so that the posterior PDF will be 
Gaussian, too. 

Suppose now that an observation of Z is made, giving the 
value z. Bayesian inference of the noise process then yields 
that the distribution of Y conditional on this observation will 
be Af{z, &). In other words, the moments of Y are given by 

^i{Y) = z (12a) 
= 0. (12b) 

We are now left with performing Bayesian inference on the 
linear part, with the variables distributed as 

X' - AA(/x',S') (13) 
X ^ AA(/x,S) (14) 
Y ^ AA(Mr),E(F)). (15) 

Here, fi and S are the already known first and second order 
moments of X (mean and covariance matrix) and fi' and S' 
are the unknown first and second order moments of X'. 

Taking into account the explicit formula (|2]l for the PDF of a 
Gaussian distribution, the logarithm of the hkehhood function 
is given by 

-^[y-^,iY)]*J:iY)-'[y-^,iY)] 

plus some constant. We get similar expressions for the loga- 
rithm of fx' and the logarithm of fx ■ Combining everything, 
dropping the factors —1/2, and using the relation y = H x, 
the Bayesian update formula ( fTOl i for the linear mapping be- 
comes: 

{x - fj,')*T,''\x - fj,') 
= c+[Hx- fi{Y)]*j:{Y)-^[Hx- fM{Y)] 

+ {x - fiyT,-\x - fi). (16) 

Here, all additive constants have been absorbed in the constant 
c. This constant is actually irrelevant because it is ultimately 
absorbed in the Bayesian normalisation factor C of (fTOl i. Both 
sides of the equation are therefore degree-2 polynomials in 
X, which confirms our earlier statement that all distributions 
occurring in the calculations for linear-Gaussian models are 
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Gaussian. Eliminating x from this equation gives us the two 
update equations we need, one for the mean, and one for the 
covariance. 

Remark. Although the probability space of X is a real vec- 
tor space, the vector entries of x need not be real. This will 
give us more freedom in choosing a basis when X is a Hilbert 
space representation of density matrices; to get real vector en- 
tries one is forced to choose Pauli matrices (or generalisations 
thereof) as basis vectors. This is the reason why we have ap- 
plied the Hermitian conjugate * rather than the transpose. 

Combining the terms that are quadratic in x yields the 
equation 

S'"^ = H*Y.{Y)-^H + Y.-^. 

Inverting gives the solution for S' [using the matrix inversion 
lemma, Eq. ©I 

S' = [i?*S(y)-iff + (17) 
= S-Sff*[ifS/f* + I](r)]-ii?S. (18) 

The factor T,H*{H1:H* + is customarily called 

the Kalman gain factor and is denoted by K. 

Combining the terms linear in x yields the equation 

with the solution for /x': 

^l' = s'[i?*i](r)-V(^) + s-V] 

= + (19) 

In the last line we have used equation (fTSl l and the easily ver- 
ified relation (S - KHY,)H*Y.{Y)-'^ = K. 

The three relevant equations are commonly known as the 
Kalman update equations, and they form the backbone for the 
method of this paper Combined with the equations (fT2l i for 
the moments of Y they read 

K = + ©]-i (20a) 

/x' = (1 + K{z - Hfi) (20b) 
S' = S - KHi:. (20c) 

It is instructive to consider the special case where the mea- 
surement mean y is exactly the state x, i.e. = 1. This 
gives the simplified equations 

K = S(E + 0)^i (21a) 
fi' = fi + K{z-fi) (21b) 
S' = S-iSTS. (21c) 

After some algebra one finds that S' is given by the parallel 
sum of S and 0: 

In other words, in this special case, the inverses of the covari- 
ance matrices add up. 
Remarks. 



1. In the expression for the Kalman gain we use a ma- 
trix inversion, and that requires invertibility of the ar- 
gument. In our setting (quantum mechanics), it turns 
out that the argument is actually never invertible. The 
reason for that is a certain set of exact constraints that 
the state has to obey. For example, when the state is a 
quantum state, its trace must be equal to 1 . In addition, 
the measurement vector also has to satisfy certain exact 
constraints. For example, in an TV-run experiment, the 
number of clicks must add up to N. This eventually has 
an impact on S, H and 0, causing HUH* + to be 
non-invertible. How to deal with this will be described 
later on, in Section lTlI Gl 

2. From an analytic viewpoint it is not readily clear that 
the expression ( I20cb for S' always yields a positive 
semi-definite matrix. More importantly, the expression 
is not the best one from a numerical viewpoint; since a 
subtraction is made, numerical roundoff may produce a 
non-positive semi-definite matrix. This is more likely to 
happen for precise measurements, yielding small vari- 
ances. For that reason, the alternative formula ( [TT] ) in- 
volving addition rather than subtraction is preferable. 
In that form it is also obvious that the obtained S' is 
always positive semi-definite. 



III. KALMAN FILTER RECONSTRUCTION OF 
QUANTUM TOMOGRAPHIC DATA 

In this Section we present the basic theory of our recon- 
struction method, based on the concepts of Bayesian inference 
and Kalman filtering, which were described in the previous 
Section. This Section contains the bulk of the material, and 
includes the mathematical underpinnings of our method. To 
assist the readers who are more interested in the method itself 
and how to apply it, we have included Sec. IIII Al containing 
a self-contained explanation of the method. Those readers are 
advised to read that Section only and then fast-forward to the 
applications and discussions Sections UVUVj andl VII 

We start in Sec. IIII B I with a characterisation of the like- 
lihood function for quantum measurements (characterised 
themselves by a POVM {11*^''-'}) and obtain a normal ap- 
proximation that allows to approximately fit a quantum mea- 
surement in the mould of a linear-Gaussian model. We find 
that incorporating the information obtained from the quantum 
measurements into the likelihood function amounts to apply- 
ing the Kalman filter update equations ( |20l ) where the entries 
of the measurement mean z are given by formula those 
of the measurement covariance by and the measure- 
ment matrix H by the matrix representing the linear mapping 

Apart from calculating the likelihood, Bayesian inference 
requires a choice of a prior, and this is partially treated in 
Sec. unci In a full treatment one would have to incorpo- 
rate the structure of the physical set into the prior, i.e. the 
prior should be zero outside of the set of physical states. This, 
however, causes the prior to be non-Gaussian and it therefore 
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does not fit the requirements of Kalman filtering. For that rea- 
son, the restriction to the physical set will be done in a post- 
processing step, as described in Sections IlII El and IIII Fi and 
instead a Gaussian dummy prior is chosen. Directly after the 
Kalman updates, a simple correction step removes the effects 
of this dummy prior again [Sec. IIII Cl Eqs. dZTl l and (|28]|1. At 
this point, the eigenvalues of the corrected S, before restrict- 
ing to the physical set, already give useful information as they 
are variances of certain linear combinations of state compo- 
nents and give an indication about how many more measure- 
ments are needed to reduce the measurement error. For exam- 
ple, to bound the maximal error on any state component from 
above, the largest eigenvalue of the covariance matrix should 
be bounded. 

Once the posterior PDF, restricted to the physical set is cal- 
culated, via its first and second order moments, one can cal- 
culate the confidence region for the reconstruction, i.e. the 
set within which the actual state should lie with high prob- 
ability (say, 95%). The basic quantity expressing this confi- 
dence region is the Mahalanobis distance. This is explained 
in Sec. lIITD] 

Next, the problem of restricting the posterior PDF to the 
physical set is treated. This is actually a very difficult numer- 
ical problem, especially in high dimensions, and turns out to 
be a challenge even for current state-of-the-art Bayesian inte- 
gration methods. In Sections [III El and llll FI we give two sim- 
ple algorithms that perform the task reasonably well, if one is 
willing to give up exactness of the solution. 

For most quantum tomography problems, the physical set 
is partially defined by exact constraints. For example, quan- 
tum states have trace equal to 1. In Sec. IIII Gl we show how 
such exact constraints are best dealt with, in order to avoid 
numerical problems. The Kalman update equations have to 
be slightly modified. 

Finally, Sec. IIII HI deals with the issue of graphically rep- 
resenting the calculated results in a meaningful way, based 
on mean values and error bars. The first moment of the pos- 
terior PDF roughly corresponds to the maximum likelihood 
solution, and as frequently happens with this kind of solu- 
tions, exhibits reconstruction artifacts, which are not features 
of the actual state, but are not ruled out by the measurements 
either. A number of methods have been developed to remove 
these artifacts, all based on maximisation of entropy or related 
functional, and we discuss these in our context. 



A. Overview 

In this Section we present a self-contained overview of our 
method for the purpose of implementation. Mathematical de- 
tails, as well as the underlying rationale are explained in sub- 
sequent sections, which could be skipped on first reading. For 
the sake of clarity we restrict ourselves to the setting of state 
reconstruction. 

The first step in implementing the method is to gather all 
relevant information about the tomography process and cast 
it in an appropriate mathematical form. The following are 
needed: 



• dimension D of the state (or of its representation); 

• the various sets of POVM elements {Il^-'^lj, one set per 
measurement setting; 

• the number of runs N per measurement setting, if ap- 
plicable; in continuous wave (CW) experiments, there 
is no such N; 

• the measurement data: the frequencies fj (the "number 
of clicks") of each of the outcomes; 

• any exact linear constraints on the state; by default, 
Tr p = 1 is the only such constraint; 

• any exact linear constraints on the measurement out- 
comes; for example, J^j fj — 

• a statistical model of the measurement process in the 
form of a PDF of the frequencies fj in terms of the 
POVM elements, N, or any other parameter; typically, 
this PDF is multinomial or Poissonian; 

• a reference state po satisfying the linear constraints on 
the state; typically, this would be the maximally mixed 
state, Po — ^d/D. 

The reconstruction algorithm consists of a number of 
phases, and we describe each in the following. It is important 
to note that we represent states by vectors. The most conve- 
nient basis is the standard basis, in which p is represented by 
the Z)^-dimensional vector Vec(p). 

1. Setup phase 

The exact linear constraints are enforced through the use 
of two projectors Tx and Tz- These have to be calculated 
first. The projector Tx is a projector on a subspace Sx in 
state space (that is, the vector space representation thereof), 
while Tz is a projector on a subspace Sz in measurement 
space (the space of measurement vectors z). In principle, 
the latter could differ per measurement setting, but we will 
assume here that it does not. The interpretation of the sub- 
space Sx is that a state p satisfies the linear state constraints 
if and only if Vec{p — po) is in the subspace Sx- For exam- 
ple, if the only linear constraint is Tr p = 1, then Sx is the 
space of vector representations of Hermitian matrices of trace 
zero; that is, the space of D^-dimensional vectors orthogonal 
to xo = Vec(po) = Vec(]l d /D) = | Id) /D. We will hence- 
forth represent a state p by the vector Vec(p — po) in Sx- The 
reference state po is thus represented by the null vector 

The interpretation of Sz is similar. If, for example, the only 
linear constraint on the measurements is fj — N, result- 
ing in the constraint J^j = 1' then Sz is the space of real 
vectors (with dimension equal to the number of measurement 
outcomes) whose components sum up to zero. 

The corresponding projectors can be calculated by con- 
structing orthonormal bases (onb) supporting each of these 
subspaces. If {xi} is an onb for Sx, we construct the ma- 
trix Xi whose columns are the vectors Xi. This matrix is a 
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so-called partial isometry. The projector Tx is then given as 
Tx = XiX^. Similarly we have Tz = ZiZ*, where the 
columns of Zi form an onb for the subspace Sz- In the actual 
algorithm we don't need the projectors but only these partial 
isometries Xi and Zi. 

Conversely, if the projectors are given, we can calculate 
Xi and Zi from them via the singular value decomposition 
(S VD). Let Tx — USU* be the singular value decomposition 
(SVD) of Tx, where U is unitary and S is diagonal, the diag- 
onal elements of which are the singular values of Tx- For a 
projector Tx of rank k, the first k singular values are 1, while 
the others are zero. Then Xi consists of the first k columns 
of U. The partial isometry Zi is calculated in the same way 
from the SVD ofT^. 

For the example where the linear state constraint is Tr p = 
1, the projector Tx is easily constructed as Tx = — 
|11d)(11d|/£'- Using an SVD, this gives Xi. Similarly, when 
the measurement constraint is J2j=i — 1' ^^'^^ d the num- 
ber of outcomes, Tz = 1^ — |l)(l|/c?, and Zi is also calcu- 
lated using an SVD. 

The exact constraints can be dealt with by expressing the 
relevant quantities in terms of the bases Xi and Zi . A tilde 
will be used to indicate this. For example, a state p satis- 
fying the constraint can be represented by the tilde vector 
X = Xl{x - Xq) = XI Vec(p - po)- 

Initial mean value and covariance: Let k be the rank of 
Tx ; thus kis minus the number of independent constraints 
on p. The initial mean value is set to the /c-dimensional null 
vector fio — 0, representing the reference state po- The initial 
covariance matrix is Sq = bik, where 6 is a scalar with a 
"large" chosen value. 



2. Kalman Filter phase 

The following is applied iteratively, for each measurement 
setting: 

The measurement matrix H represents the POVM elements 
of the current measurement setting. The j-th column of H is 
given by Vec(n(-')). From H we calculate H = Z^H Xi, to 
incorporate the exact state and measurement constraints. 

The reference measurement vector is then Zq = H Xq. The 
measurement vectors will be represented by the tilde vectors 
z = Zi{z — Zq). The original measurement vector z is de- 
rived from the actual measurements fj, along with the mea- 
surement covariance matrix 0, as follows. 

For a measurement with multinomial statistics (as for N 
runs of single-photon tomographic measurements) z is given 
by Zj = {fj + l)/{N+d) (formula Q), and by the formula 
(|7]i. The tilde quantities then follow using z ~ Zl{z — Zq) 
and = Z*0Zi. 

For measurements with Poissonian statistics, in the case 
that the POVM elements add up to either the identity matrix 
or a scalar multiple thereof, the same formulas hold, but with 
N given by iV = fj- When the POVM elements do not 
add up to a scalar matrix (a situation that better be avoided). 



the modified formulas ( |23] ) and ( l24b have to be used for z and 
0. 

We now apply the Kalman filter update equations ( |45] l, writ- 
ten entirely in terms of tilde quantities: 

k = tH*{Hl:H* + e)-^ 

p.' = fl + k{z - Hfi) 
S' = i: KHfl. 

For the first iteration, we set ft ~ p,Q and S = Sq. The 
primed quantities are the updated values and have to be used 
as fl and S in the next iteration. 

After the final iteration, the "untilded" quantities can be cal- 
culated, if one so wishes, using the formulas fj,' = fiQ + Xip,' 
and E' = XiS'Xi*. 



3. First Interpretation 

The Kalman filter reconstruction procedure yields a Gaus- 
sian distribution over the linear space containing the physical 
state space as a subset. In most cases, this distribution will 
have a non-negligible probability outside this physical set. In 
fact, more often than not the mean of the distribution will be 
non-physical. At this point, the reconstructed mean and co- 
variance only summarise what the measurements tell us, re- 
gardless of the physical significance of the outcome. In the 
next phase, the physicality constraint has to be combined with 
the reconstruction, as a kind of prior knowledge. 

However, the unphysical covariance matrix already gives 
us interesting diagnostic information about the tomography 
itself, namely about the inherent accuracy of the measure- 
ments. To that purpose one can investigate the eigenvalues 
of the covariance matrix S. If some (or all) of these are large, 
that means the tomography is not able to give accurate infor- 
mation about the state in the direction of the corresponding 
eigenvectors. Application 2 (Section [V| gives a particularly 
nice example of this. 



4. Restriction phase 

Restricting the reconstruction to physical state space can be 
done in a number of ways, but is always more complicated 
than in the MaxLik case, because the covariance matrix also 
has to be treated. The easiest way is to first calculate the max- 
imum likelihood physical state. This is the state pml satisfy- 
ing Tr ptv/l = 1 and pml > and minimising the squared 
Mahalanobis distance — {p — fJ')*'^^^{p ~ fJ') to the re- 
constructed unphysical mean state. In the presence of exact 
constraints, these calculations are best done using tilde quan- 
tities. This is a constrained minimisation problem that can be 
reformulated as a semi-definite problem (SDP) (see Section 
nil Eb and can be solved efficiently using SDP solvers. 

The obtained minimal Mahalanobis distance A^^/^ has di- 
agnostic value. If nothing has gone wrong, the MaxLik so- 
lution should be well within the confidence region of the 



10 



reconstructed likelihood distribution. Taking a 95% confi- 
dence value, the confidence region is given by the inequality 
< Iv — 1/2 + 1-5)^, where v is the dimen- 

sion of the subspace Sx supporting the reconstructed state p 
(the number of degrees-of-freedom). If the value of A^^/^ is 
much larger than that, this indicates that something has gone 
wrong either with the tomographic measurements or with the 
reconstruction, in the sense that the underlying assumptions 
are violated, for example if certain additional noise sources 
haven't been accounted for. Application 1 illustrates this as- 
pect very clearly. 

If M\ij^ falls within the confidence interval related to the 
unphysical reconstruction, we can calculate the confidence 
region for the physical restriction. A good approximation 
for that region is given by the intersection of the ellipsoid 
{p- PML)*^~^{p- Pml) < {^/iZ + MmlY =■■ 7^ with 
the physical set. A drawback of this approach is that to get er- 
ror bars the intersection has to be calculated explicitly, which 
is a non-trivial task. 

In Sec. IIIIFI we present an alternative algorithm for per- 
forming the restriction to the physical set. This algorithm 
does not rely on an SDP solver and, furthermore, yields an 
explicit confidence region, allowing to calculate error bars in 
a straightforward way. 

Finally, it is possible to calculate a regularised solution. 
This is a physical state within the physical confidence region 
that optimises a certain regularisation functional. Possible 
choices are the entropy, and then we get the so-called max- 
imum entropy (MaxEnt) solution, or a functional expressing 
the smoothness of a solution. An example of the latter is given 
in Application 2. The calculations for this again require solv- 
ing a semi-definite program (see Sec. lIIIH~4] l. 



B. Approximation of the Measurement Process in Quantum 
Tomography by a Linear-Gaussian Model 

1. The Likelihood Function in Quantum Tomography 

Any quantum measurement, be it in state tomography or 
process tomography, can be characterised by the application 
of a POVM {ne^)}^^ to a certain state p; this state could 
be the state under investigation, or the output of a quantum 
process given a certain applied input state. From the point 
of view of the experimenter, the state p is initially unknown, 
even though the experimenter may have certain preconcep- 
tions about it. Because the tomographic experiments reveal 
information about the state of a statistical nature, the state has 
to be treated as a random variable. Henceforth, p will denote 
an observed quantum state corresponding to a random vari- 
able denoted by R. 

Quantum mechanics predicts the probabilities of each of 
the K POVM outcomes on a state p to be pu = Tr[pn(''']. 
We define the vector of probabilities as p = {pk)k=i- When 
the state is described by a random variable R, the vector p 
is an observation of an underlying random variable P, with 
Pfc = Tr[i?n('^)]. 

In reality, P is never observed directly. We will consider 



two types of optical tomography experiments in this paper 
In pulsed mode tomography experiments, N individual light 
pulses are sent into the system, each pulse prepared in a state 
p. The POVM measurement is repeated N times, presumably 
on a sequence of N independent identically prepared states p. 
For every pulse, a detector either clicks or does not click. The 
results of these N runs can then be combined into a vector of 
frequencies / ~ {fk)k=i of the respective outcomes. This 
vector is an observation of a random variable, F. As is well- 
known, for fixed N and p, F has a multinomial distribution 
with parameters TV and p: F ^ Mtn(iV; p). 

In continuous wave ( CW) mode optical experiments, the in- 
coming laser beam is turned on for a relatively long but fixed 
time, and the number of times the detectors click in that time 
span are taken as the frequencies fk- The elements Fk are 
Poisson distributed with mean value pk — Apk, where A is 
a proportionality factor called the brightness factor. This in- 
corporates the intensity of the laser beam, the duration of the 
experiment, detector losses, etc. Obviously, the sum of fre- 
quencies N — fk is not fixed but is a Poissonian random 
variable as well. 

Combining all this with the relation Pk = Tr[i?n('^'] we 
obtain the PDF fF\R{f\p) of F conditional on R, or the like- 
lihood function when considered as a function of p. Pictori- 
ally, we have the following (for pulsed mode experiments): 

p Mtn(Af,p) ^ 

The first step is a linear mapping, and the second step is the 
quantum noise model. In comparison, recall that Kalman fil- 
tering is based on the linear-Gaussian model: 

The first step is again a linear mapping, but the second step is 
an additive Gaussian white noise (AGWN) model. 

As mentioned, the basic idea explored in this paper is to 
approximate the quantum model by a linear-Gaussian model 
in order to open the door for Kalman filtering. To do that, 
the following incompatibilities have to be overcome: first, in 
linear-Gaussian models there are typically no restrictions on 
the state vector X, while in quantum mechanics R is con- 
fined to quantum state space (positive semi-definite and trace 
equal to 1). We postpone the solution to this problem until 
Section lTlI El and iust pretend for the time being that the vari- 
able R is unrestricted. 

The second difference is of course the different noise 
model. While in both measurement models the first step is 
a linear mapping, the quantum noise model is non-additive 
and non-Gaussian. In spite of this apparently rather large dif- 
ference, the simplicity of the Kalman filter equations is so ap- 
pealing that one is enticed to try and approximate fpiR by a 
linear-Gaussian model anyhow. Indeed, many distributions, 
including the multinomial and Poisson distributions, can be 
approximated by a normal distribution, and according to the 
law of large numbers the approximation gets better when the 
number of observations increases. Incidentally, this is why we 
have imposed the requirement that the number of experimen- 
tal runs per measurement setting should not be too small. 



11 



So the main problem we are faced with is to reconcile the 
two models in a statistically sound way, but without losing 
sight of the practical issues. In the next subsection we first 
present a deceptively simple "solution", one that comes to 
mind almost automatically, but which suffers from a num- 
ber of serious drawbacks. An observation that is more than 
200 years old will then provide a way out of this conundrum, 
paving the way to a more satisfactory solution. 



2. A naive approach 

If we make the straightforward identifications Z — F, 
Y = P and X — R, then Lfi^p{p\f) provides the likelihood 
function Lx\z required for the Bayesian update formula ( fTOb . 
There are two problems with this, however, preventing a direct 
mapping to a linear-Gaussian model: 

1. P enters in the moments of F of all order, and not just 
the mean. 

2. P enters in these moments in a highly non-linear way. 

A first naive approach could be to simply approximate the 
multinomial distribution by a multivariate normal with mean 
Npk = iVTr[pn('^^] (which is linear in p, as required) and 
with covariance matrix the one obtained by taking the covari- 
ance matrix of the multinomial distribution and replacing ev- 
ery occurrence of pk by fk/N (which is independent of p, as 
required). 

Although this superficially seems to solve the above prob- 
lems, a serious drawback of this approach is that the assign- 
ment of the covariance matrix is very ad-hoc; for example, pk 
is replaced by its estimator fk /N in the covariance but not in 
the mean. Even more importantly, this approach is statistically 
ill-founded and, in fact, underestimates the actual variance of 
F. 

This is most apparent when some of the components of / 
are zero. Indeed, consider an iV-trial 2-outcome measure- 
ment, where / = (/o,/i) = {fo,-N - fo), and suppose 
/o = 0. In the naive approach, the variance assigned to fpiR 
would be Np{l — p), with p replaced by fo/N, hence giv- 
ing zero. This is clearly a mistake because a variance of zero 
amounts to perfect knowledge, and a confidence interval of 
zero width. However, never having seen outcome '0' is no 
guarantee that '0' will not occur in the future, no matter how 
high the value of N may be. This has also been noted in 
Ref. B\. 

The first documented encounter of this phenomenon ap- 
peared in a 1774 paper by Laplace, as the so-called "sun- 
rise problem": calculate the probability of a sunrise, solely 
based on the information that it has risen N days before ll29tl . 
The answer is not 1 . Instead, the correct value of this condi- 
tional probability is given by a formula that is now known as 
Laplace's rule of succession (see, e.g. Ref. ifsoll ). In a more 
wider context we can consider the "visible sunrise problem" 
and calculate the probability p that we can see the sun rise (un- 
hindered by clouds), given that we have done so in / of the N 
days before. In the modern interpretation of Laplace's rule. 



p is a random variable with a uniform prior, and conditional 
on the N observations has a posterior PDF that according to 
Bayes' rule is a beta-distribution with parameters / and N—f, 
whose mean is (/ + 1)/ {N + 2). While useful for predicting 
sunrises, beta distributions will also offer the solution to our 
reconstruction problem. 

3. Bayesian Solution 

Essentially, Kalman filtering can be seen as Bayesian infer- 
ence, simplified to the case of linear-Gaussian models. When 
the noise is no longer Gaussian, as in our case, but we still 
want to reap the benefits from the simplicity of the Kalman fil- 
ter equations, we really should be looking at the Bayesian in- 
ference equations and suitably approximate these, rather than 
approximate the model and apply Kalman filtering to that. In 
this way we can avoid the problems of the naive approach. 

More precisely, what we will do is match the two models 
after Bayesian inversion of their noise processes. Recall, for 
the linear-Gaussian model this gave 

X^Y, Y ^M{^l{Y),E{Y)), 

with the moments of Y determined by the observation z, 
Eqs. (fT2] i. For the quantum measurement model we have 

j^n^ p Mtn(jy,p) ^ 

Bayesian inversion yields the PDF of P conditional on the 
observation / of F. As explained in Section IIIBI this is a 
Dirichlet distribution with parameter f,P^ Dirichlet(/), 
and moments given by ^ and 

The solution to the matching problem has now become very 
simple. We match the partially inverted quantum measure- 
ment model to the partially inverted linear-Gaussian model, 
and to do so we approximate the Dirichlet distribution by a 
Gaussian distribution with same first and second order mo- 
ments (moment matching). The upshot of all this is is the 
following rule: 

In the Kalman filter update equations ( |2Qt replace z by for- 
mula dS]), and by 0. 

Remarks. 

1. In our context, the formula for the mean of a Dirich- 
let random variate (Laplace's rule of succession) could 
be paraphrased as "each outcome gets one click for 
free". In statistics this extra count is called a pseudo- 
count [31]. In comparison, the mode (the position of 
the maximum of the PDF) is given by pi = fi/N. 

2. We would like to point out that in maximum hkelihood 
reconstruction one takes the mode as the basic quantity 
(the relative frequencies of the outcomes), as that is the 
point of maximum likelihood, while in our approach we 
use the confidence region, which is approximately cen- 
tered around the mean (the modified relative frequen- 
cies, with pseudocounts included). To counter poten- 
tial objections against this approach we already mention 
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here that, for any non-unreasonably small confidence 
value, the mode is well within this confidence region. 
This will be shown in the appendix. Section IbTI 

3. In fact, even if one is not going to calculate the confi- 
dence region, Laplace's rule tells us that one should re- 
ally use the modified relative frequencies, because "the 
mean of a posterior can be thought of as being more 
representative [than its mode] as it takes into account 
the skewness of the PDF." (|28], p. 25). 

4. In this whole discussion we have quietly disregarded 
the fact that in the quantum setting the probabilities 
P do not necessarily range over the whole probability 
space. The range is essentially determined by the rela- 
tions pj — Tr[/3n(-')]. Thus, to be completely correct, 
all Bayesian integrations should be carried out over this 
range. However, in many cases, integrating over the ex- 
act range complicates the calculations too much. On 
the other hand, in the quantum tomography setting, per- 
forming exact integrations does not guarantee that the 
final solution, in terms of the state, belongs to the phys- 
ical set anyway. Therefore, we integrate over the full 
probability space (the d-dimensional probability sim- 
plex) and restrict the solution to the exact physical set 
afterwards (see Sections 1111 El and 1111 Fb . More about 
this is discussed in Section lTllB 51 

We have illustrated our approach here for the case of pulsed 
experiments, where the distribution of clicks is governed by 
the Multinomial distribution. More generally, the approach 
can be described as follows. Let / be the PDF of the distri- 
bution of the outcome frequencies, conditional on the proba- 
bilities p. From this, derive the conjugate PDF, i.e. the PDF 
of P conditional on the observed frequencies. Then approxi- 
mate this conjugate PDF by a Gaussian using moment match- 
ing. This amounts to replacing z and in the Kalman update 
equations by the first and second order moments (which are 
functions of the observed outcomes) of the conjugate PDF, re- 
spectively. 



4. Poissonian counts 

Consider, as a second example, CW experiments, where the 
outcome frequencies are governed by a Poisson distribution. 
More precisely, the frequencies fj of the outcomes are inde- 
pendent Poisson variates with parameters jij = Apj, where A 
is the brightness factor of the experimental setup. 

The conjugate distribution of the Poissonian f{k\p) = 



e~'^^i''/k\ with fi = Apis the PDF 



fip\k) 



Ae-'^P{Ap)'' 
k\[l-Q{k + l,A)] 



(22) 



where Q(fc+1, A) is the regularised incomplete Gamma func- 
tion Ba . 

We can apply this to find the PDF of P conditional on the 
outcome frequencies fj. While the latter are independently 



Poissonian distributed, the pj have to add up to 1 and are 
therefore correlated. Thus, the PDF of P equals the product 
Yij fiPjlfj) renormalised to 1 over the probability sim- 
plex of p. A short calculation yields the surprising result that, 
again, the conjugate PDF is given by the Dirichlet distribu- 
tion, with N = J2j=i fj- Maybe even more surprising is the 
fact that the brightness factor A cancels out completely. This 
is rather convenient, since, in general, A is not known, or at 
least not with great precision. We can therefore carry over the 
formulas for the pulsed mode case to the CW case wholesale, 
with the one addition that TV has to be expHcitly defined as 



5. Non-POVM Measurements 

As is well-known, the most general measurement one can 
perform is a POVM measurement, described by POVM- 
elements, positive-semidefinite operators that add up to the 
identity operator. In practical experiments, however, one is 
not bound to implement the full POVM. For example, one 
could just implement one element of the POVM at a time, 
make N measurements with it, and leave the other elements 
for subsequent runs. Under the assumption that the mea- 
surements are always made on identically prepared state, this 
makes no difference in the end result (the vector of outcome 
frequencies). Because of this, one can simulate measurements 
that cannot be performed in a single shot, namely POVM 
measurements where the elements do not add up to identity, 
as long as the elements 11 '^^^ themselves obey the condition 
< n'^"'^ < 1 (so that they form part of some POVM-proper). 

However, for the purposes of tomographic reconstruction, 
in particular for the kind we consider here, this potentially 
poses a problem in the CW case. When the POVM elements 
add up to a multiple of identity, the unknown brightness factor 
A drops out of the calculations, just as in the case of proper 
POVMs. When the elements do not add up to a multiple of 
identity, this is no longer the case, and the calculations become 
more complicated. The brightness factor A is now a so-called 
nuisance parameter and has to be removed from the likelihood 
function by integrating it out, as shown below. 

This situation has already been considered in Issi [34ll for 
Maximum Likelihood reconstruction. It was noted there that 
the sum of the POVM elements, the matrix E^^^ J2j n^-'^ 
determines the field-of-view of the tomography experiment. 
That is, if H^"^ is supported only on a restricted subspace, 
the reconstructed state will also be supported on that subspace 
only. The tomography will be "blind" to state components 
outside that subspace. Moreover, the eigenvalues of II'^^ de- 
termine the sensitivity of the tomography along the corre- 
sponding eigenvectors. The larger the eigenvalue, the more 
accurate the tomography in that direction will be. 

When the POVM elements 11*^^^ no longer add up to iden- 
tity, the corresponding "probabilities" = Tr[pn'^-')] do not 
add up to 1 either Let us then define po — Simi- 
larly, define /o — J2'j=i fj- The maximum likelihood recon- 
struction method can be extended to cover this situation by 
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renormalisingpj: one just replaces pj by Pj/po in the original 
MaxLik formulas. This so-called extended maximum likeli- 
hood principle was first suggested by Fermi (see llssll . p. 90). 
For our reconstruction method, however, we need the mean 
and variance of the likelihood function, and not just the mode, 
and these depend on 11*^°^ in a more complicated way. 

If the POVM elements add up to a multiple of identity, 
n(°) =71/1, then po = ^ constant. Otherwise, po is not 
a constant, but depends on the state p. The PDF of the corre- 
sponding measurement outcomes (CW case) is the product of 
Poissonians 



A second problem that occurs when po is not a constant is 
the geometrical shape of the probability space V. In principle, 
this shape can be derived from the relations pj — TilpH^^^, 
but this nearly always yields a complicated set and integrating 
over it is extremely difficult (as has been remarked before). 

X z 
z 1-, 



For example, let p be a qubit state p 
take the POVM elements 



and 



1 




,n(2) = 




1 



{Ap,)f^ 



= exp(-Apo)A^"nTT- 

,=1 h- 

The corresponding likelihood function is proportional to this, 
and can be converted to a PDF by normalising over the set of 
allowed values of P. Now this is exactly the problem: how 
should the probability space V of P look like when the pj no 
longer add up to 1 ? When po is a constant, po = M, it is 
reasonable to take the set of non-negative vectors adding up 
to M as probability space. Then the normalisation constant 
becomes 

eM'AM)Af° / dpi . . . dpd n TT. 

and all references to A indeed cancel. The end result is then a 
Dirichlet distribution in terms of the probability vector p/A/. 
The mean and covariance matrix of P are thus given by the 
formulas for the Dirichlet moments, multiplied by M and Af ^, 
respectively. 

The remainder of this subsection can be skipped on first 
reading, and can be skipped altogether if one always makes 
sure that the POVM elements used add up to a multiple of 
identity; this design rule is recommended. 

When Po is not a constant, this magical cancellation of A 
no longer takes place. The standard way to deal with this in 
Bayesian inference is to consider A as a random variable, too, 
and calculate the joint distribution of A and P: 

fA,p\F{a,p\f) oc exp(-apo)a-^« J| 

j=i 

Since we are not really interested in A (it is a nuisance pa- 
rameter) we then take the marginal distribution of P by in- 
tegrating out a. Using the integral daex-p{~poa)a-f° — 

fol/p{)°^^, this gives 



„/o + l 

Po 



n(3) 



1 1 
1 1 



1 i 

-i 1 



Then V is defined by pi + p2 = 1 and (pi — 1/2)^ + 
(P3 - 1/2)2 _ y2)'^ < 1/4 (essentially a Bloch 

sphere). The reader is invited to try and integrate the func- 
tion pl^Pj^Pg^pl^/pp"^^ over this set. 

As before, we propose to integrate over the smallest set con- 
taining V and giving easy integrals, and restrict to the physi- 
cal set in a later phase. The easiest way to do this is to first fix 
Po and include all points in the simplex V{po) := {p : Pj > 

0, Pj = Po}, perform all the calculations conditional on 
this assumption, and then average over the range of po. This 
range can be easily determined from the extremal eigenvalues 
of n(°^ := n^-'^ Let TO and M be the smallest and largest 

eigenvalue of H^''^ respectively, then m < po < M. 

To average over po we need a measure; to get V = 
Um<po<A/ '^(Po)^ with all points in the set equally weighted, 
this measure has to be proportional to the volume of the sim- 
plex V{pa). As this volume is proportional to Po~^, the mea- 
sure is Po^^ dpo /K, with 



K = 



M 



For fixed po, the calculations show that P/po follows a 
Dirichlet distribution, with N = fj . Thus the moments 
of P are: 



p{Pi\Pq = Po) Po 
p{PP,\Po=po) - pI 



N + d 



{N + d){N + d+l) 

(/, + l)(/, + 2) 
{N + d){N + d+l)' 

Now we average over Po- The average of Pq is 



fiiP^\Po=Po) = pI- 



C dpoPo 



KPo) 



d + k 



4>kM\ 



where we have defined 



d 1 - {m/MY+'' 
d+k 1 - {m/MY 
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The range of <f>k is between d/{d + k) and 1, obtained when 
m — and m — M, respectively. 
Hence we get 





= Af0i- 











(A^ + d)(iV + d+ 1) 



(23a) 



(23b) 



(23c) 



{N + d)[N + d+l) 
This yields the covariance matrix elements via the relations 

al^{P) = ^i{P.,Pj)-^Ji{P,)^Ji{Pj) (24a) 
al,{P) = ^i{Pf)-^i{P,f. (24b) 

For the extreme case m — 0, this gives 

da{U + l){f, + l) 



al^iP) = AP 



[d + + 2)(iV + d)2(iV + d+l) 
, + l)(a/, + 6) 

[d + + 2){N + d)2(iV + d + 1) 
a := N-d^ -d 
b := (d^ + 2d + 2)A^ + rf-'' 

As a small check, for m = Af (H^"^ = Ml), we get the mean 
and covariance matrix of the Dirichlet distribution, multiphed 
by M and M'^, respectively. 

C. Choosing a Prior 

In this Section, we tackle the problem of choosing an appro- 
priate prior distribution, as required for starting the Kalman 
filter process. We also have to solve problem of restricting the 
solution of the Kalman filter to physical space. Both problems 
could have been solved in one go by choosing the prior to be a 
uniform distribution over state space, and setting it equal to 
outside of it. Unfortunately, Kalman filtering requires a Gaus- 
sian prior, and we leave the solution of the restriction problem 
to the next section. In this section, therefore, we ignore the 
restriction to physical state space. 

When we have no prior information about the quantum state 
apart from the tomography data, we have to construct a prior 
that reflects this total lack of knowledge. Moreover, to allow 
for the application of Kalman filtering, this prior has to be 
Gaussian. One such prior could be a Gaussian with an infinite 
covariance (the mean would then be irrelevant): S = ool. 
In numerical computations, this infinity of course has to be 
replaced by a finite, but still big number b, giving S = fel. 
On the other hand, to avoid numerical instability, b should be 
not too big. 

But what does big enough and not too big mean? Fortu- 
nately, as we are dealing with quantum state estimation, we 
know that the state belongs to a bounded set: its eigenvalues 
are positive numbers summing up to 1 . To illustrate this, let 
us restrict to diagonal d-dimensional states, i.e. distributions. 



With the choice S = 61, the squared 2-norm distance be- 
tween two such distributions p and q is given by | |p — jj/fo. 
(Why we take the 2-norm distance will become clear in the 
next Section). The maximum value is therefore 2/b. To min- 
imise the influence of the prior, this distance should be small 
enough, and certainly much smaller than d. Hence, we need 
^ 2/d. In our numerical experiments, we have chosen the 
value 6=1. 

For the mean of the prior, the best choice is to take a state 
"in the middle" of state space. For distributions this would be 
the uniform distribution {I, . . . ,l)/d, for quantum states the 
maximally mixed state po — t/d. More generally, one could 
take the state that has the largest entropy within the physical 
set. 

An alternative solution to the problem of choosing a prior 
is based on the observation that the Bayesian update equation 
([Tol l is basically a multiplication and therefore all Bayesian 
updates commute. We can therefore start with any suitable 
prior and divide it out again after all Kalman filter updates 
have been performed. This amounts to the same thing as start- 
ing off with the infinite width prior This division is easy when 
the covariance matrix of the chosen prior is a scalar matrix. 
So = 61, with some finite choice of b. 

Let /xq represent some fixed state and let us consider mea- 
surement parameters of a very specific form z = HfiQ and 
— bHH* . In that case the Kalman update equations sim- 
plify to 



/i' = 6(6 + S 
S' = 6(6 + S 



S(6 + S)-Vo 



(25) 
(26) 



Using this it is easy to calculate that starting off the Kalman 
filtering sequence with the "infinite width" Gaussian prior 
(/I = and S = cx)l) and applying the Kalman update step 
{z = -ff/^o and © = bHH*) yields as updated state a Gaus- 
sian with /x' = /iQ and S' = 61. Starting off with this Gaus- 
sian as prior (p, ~ fiQ and S — 61) is therefore equivalent 
to starting off with an infinitely wide prior and applying this 
particular Kalman update step once, anywhere during the se- 
quence (anywhere, because of commutativity). In particular, 
this update can be done at the end of the sequence. 

Undoing the narrow prior can therefore be done after the 
final Kalman update by applying the inverses of equations d25] l 
and (|26] l. Denoting by /j, and S the quantities obtained at 
the end of the Kalman filter sequence, and by /icon and Scoit 
the corrected ones, with infinite prior, we have the correction 
equations 



Seo„ = (S-l-l/6)-l 

/^coi-r A* + (Scon-/6)(/^ ^ Mo) 



(27) 
(28) 



In practice, one could for example choose fiQ to be a repre- 
sentation of the maximal entropy (maximally mixed) quantum 
state (t/d). 

The problem with these correction equations is the extreme 
sensitivity of ficon to even the tiniest variations in /j, when 
Scon has very large components. While this is not nec- 
essarily a numerical artifact — large uncertainties on certain 
components of the covariance should go hand in hand with 
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equally large uncertainties on the mean — it may cause numer- 
ical problems further down the line. For that reason we try to 
avoid this situation by choosing a slightly larger value for b in 
the correction equations than was used in the construction of 
the prior To obtain a corrected covariance matrix Scon with 
an upper bound ct^^^x the variances we can choose a value 
b' satisfying 1/6 - 1/6' = l/al,^. 

D. Calculation of the Confidence Region 

The mean value and covariance matrix that we have been 
able to calculate using the Kalman filter method are not ends 
in themselves. One possible use of these is to calculate mean 
and variance of certain operators when applied to the state. In 
Ref. |5] it is shown that the mean value and variance of an 
operator Z depends on the mean state /i and state covariance 
S (the inverse of the Fischer information matrix F) via the 
relations 

(z) — Tr fiZ 
((Az)2) = z*^z, 

where 2; is a vector representation of the operator Z. The 
error bars on Z can then be derived by setting appropriate 
condifence levels. 

In this subsection, we derive more generally an expression 
for the confidence regions for the complete state, correspond- 
ing to the reconstructed mean value and covariance. The con- 
fidence region is the region around the mean value obtained 
from the Kalman filter procedure within which the actual state 
can be found "with high probability". The value of this proba- 
bility is called the confidence value and is typically chosen to 
be 95%. Stated otherwise, the probability that the actual state 
is outside the confidence region should be "low", e.g. 5%. 

For the multivariate normal distribution, with mean fi and 
covariance matrix S, the confidence region is an ellipsoid cen- 
tered around the mean. This is quite clear as the surfaces 
where the PDF assumes a constant value are governed by the 
quadratic equation {x — (x — fi) = c (note the Moore- 

Penrose inverse, as required when there are exact linear con- 
straints; see Section lTll Gi . The quantity of the left-hand side 
is the squared Mahalanobis distance between points x 
and fi, as defined by Q. The surface of the confidence re- 
gion is thus the set of points at a certain Mahalanobis distance 
from the mean. To find which value A4 should take for which 
confidence value, we have to consider the distribution of Ai . 

It is well-known that the squared Mahalanobis distance has 
a chi-square distribution with ly degrees of freedom (DoF): 
Ai^ ~ xi, where v is the rank of S, equalling the dimen- 
sion of X minus the number of independent exact constraints 
(zero variance components). A proof of this basic fact goes as 
follows: 

Proof. Suppose S has rank u and is bounded (all eigenval- 
ues are finite), then its MP inverse has rank ly as well and can 
therefore be written as 'E^ — Q*Q, where Q isai^xd matrix. 
Introduce the i^-dimensional random vector U — Q{X — fj,). 



Then the entries of this vector are independent and distributed 
as Ui ~ind A/'(0, 1). The sum-of-squares of these entries is 
then, by definition, xt distributed IBol . Since the squared Ma- 
halanobis distance A^^ is equal to 

= (a;-/x)*S^(a;-/x) 
= {x- fi)*Q*Q{x- n) 

* \ ^ 2 

= u u = 2^Ui, 

i 

it follows that, indeed, A4^ ^ xl- ^ 

We summarise the main properties of the chi-squared dis- 
tribution ll36ll . The PDF as a function of x, with x > 0, is 
given by 

-—l-—-x''/^-^ exp(-a;/2), 
and the cumulative distribution function (CDF) by 

r>Ax/2) 
r(^/2) ' 

where r(a, y) is the incomplete gamma function 

The mean of a variable ^ is v and its variance is 
2v. The variable X itself is distributed as X ~ Xv For 
not too small values of v, X is approximately normal with 
mean \/v — 1/2 and variance 1/2. The 95% confidence in- 
terval of X is therefore approximately given by < a; < 
yjv - 1/2+ 1.16309, where 1.16309 = lnvErf(0.9), the root 
of [1 + Erf(x)]/2 = 0.95. Even for the smallest v that 
we will encounter, this approximation turns out to be very 
good; for = 3 (a single qubit state, for example) the value 
X < y/3 - 1/2 + 1.16309 yields the only slightly smaller 
confidence value of 94.3%. 

One immediately obtains that 95% of the probability mass 
of a multivariate normal is contained in the ellipsoid consist- 
ing of points whose Ai'^ lies in the 95% xt confidence inter- 
val. In other words, the 95% confidence region is the ellipsoid 



:= {x-n)*-E^{x-fj,) < (Vt^- 1/2+1.16309)2 =: 7^. 

(29) 

This formula lies at the heart of many statistical procedures, 
for example outlier detection. 

Now, since we are approximating the actual posterior PDF 
by a Gaussian, the true confidence region will be different 
from the one just obtained. However, we show in Appendix 
IB 21 that the difference will not be dramatic. Even in the 
worst case, the actual distribution of is very close to 
chi-squared, but has a variance that is larger by a factor of 
about 30% (unless N, the number of measurements per run. 



In Matlab, the CDF can be calculated using the built-in function 

gammainc (x/2 , nu/2 ) . 
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is pathologically small). This means that the confidence re- 
gion will be slightly larger than the value given by ( |29] ). A 
conservative estimate is to take 

7, = (v/j^-l/2 + 1.5)2. (30) 



E. Restricting to Physical State Space 

In this Section and the next, we treat the problem of restrict- 
ing the solution of the Kalman filter to the physical region S; 
when the state is a quantum state, this means restricting solu- 
tions to state space, the set of positive semidefinite matrices of 
trace 1 . We will perform this restriction as a post-processing 
step after the Kalman filtering calculations. 

At the level of the PDF, the restriction involves setting the 
values of the obtained PDF equal to zero outside S, and then 
renormalising the PDF (as it should now integrate to 1 over S 
instead of the whole space). The whole art is to determine the 
value of the new normalisation factor. This requires the inte- 
gration of the posterior PDF over S, which is a very compli- 
cated problem, especially for high dimensions; this problem is 
known as the Bayesian integration problem. Likewise, similar 
integrals are necessary to obtain the moments of the restricted 
PDF. 

Various numerical methods have been prop osed to approx- 
imate such integrals; for an overview see lISTllssll . The partic- 
ular problem faced here is that the intersection of the uncon- 
strained confidence region with the physical set has extremely 
low volume both within the confidence region and within the 
physical set, in part due to the high dimensionality of the prob- 
lem. This turns out to be a very challenging situation for all 
existing integration methods. 

In the following, we present two approximative methods. 
The first method is very simple but rather crude and actually 
circumvents the Bayesian integration problem. It is not a very 
powerful method, because it only allows to check whether a 
state is in the restricted confidence region. For some situa- 
tions this might be already enough. If one desires to know the 
shape of the restricted confidence region, e.g. via its moments, 
then this method does not suffice. Nevertheless, the method is 
extremely simple to apply. 

A second method, discussed in the next Section, is more 
powerful and yields an approximation of the restricted confi- 
dence region in expUcit form. Perhaps not surprisingly, it is 
based again on Kalman filtering and yields an approximative 
confidence region expressed by a mean value and a covariance 
matrix. This method, while still in its experimental stages, 
seems to work amazingly well in practice. 

The simple method consists of keeping the first and sec- 
ond moments of the unphysical posterior PDF but modifying 
the A4qji value to take the renormalisation over the physical 
set into account. Furthermore, rather than calculate the exact 
value of the new Ai^j^, a conservative upper bound is taken 
that is easy to calculate. 

The method consists of two parts, of which the first one is 
optional. First, the maximum likelihood (MaxLik) solution 



is calculated. That is, the physical state for which the un- 
physical posterior PDF is maximal is calculated. This corre- 
sponds to the following semi-definite program: find the mini- 
mal value of t for which a state p exists satisfying the mixed 
semi-definite/quadratic constraints 

p > 
Tr p = 1 
(p-/x)*St(p-/x) < t. 

For this minimal t, the state p in question is the MaxLik solu- 
tion. 

Even though very efficient semi-definite program solvers 
exist ll39ll . this part can still be very time consuming when the 
dimension of the state is high. Nevertheless, finding the Max- 
Lik solution is interesting enough in its own right to warrant 
inclusion of this part. After all, this solution is what most re- 
construction algorithms try to find. In our context, the MaxLik 
solution also allows to check the validity of the tomography 
data. Indeed, given that the MaxLik solution corresponds to 
the best physical "guess" of the actual state, the former should 
lie within the "raw" (i.e. unconstrained) confidence region al- 
lowed by the measurements. Thus, the Mahalanobis distance 
between the MaxLik solution pml and the mean of the un- 
constrained posterior PDF should be below AicR- If not, this 
could be an indication that something is wrong, either with 
the data, or with the underlying assumptions (e.g. the noise 
model). 

For the second step, we consider the Mahalanobis distance 
just calculated, 

MIjl (pml - l^r^\pML - m)- (31) 

The confidence region for the constrained PDF is the intersec- 
tion of the physical set with the ellipsoid (p — /i)*I]^(p — /i) < 
^CR.phys^ where M^j^ pj^y^ is the confidence value for the 
constrained posterior PDF. Note that /i and E are still the 
moments of the unconstrained PDF. Because the constrained 
posterior has to be normalised over the physical set only, 
A4cB.phys will be larger than Men- Calculating the exact 
value of this AicR.phys is an extremely difficult problem, but 
we can prove the validity of a very simple upper bound: 

'McR^phys < M.Cri.Mnphys + M.ML, (32) 

with Mml given by Eq. dSTl ). The proof of this bound is 
given in AppendixlAl 

In case one does not even want to calculate the MaxLik 
solution, and one is willing to believe this solution is in the 
unconstrained confidence region, A^ml can be replaced by 
its (presumed) upper bound A4cR,unphys, giving the simple 
result that the constrained confidence limit is at most twice the 
unconstrained one: 

M.CR,phys < "^M-CR^iinphys- (33) 

In conclusion, the simple method consists of doing the fol- 
lowing: 

1. Depending on resources and taste, choose between 
steps 2 or 3, then proceed to step 4. 
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2. Calculate the physical MaxLik solution, i.e. the solu- 
tion in V that minimises the Mahalanobis distance to 
the (unphysical) /j,. Record ^AML, the minimal Maha- 
lanobis distance just found. 



3. Or, just take Mail ^ M 



C R.unphys • 



4. A physical state p is in the physical confidence region 
if its Mahalanobis distance to /i is not (much) above 

■McR,unphys + M.ML- 



F. Restricting to Pliysical State Space; Kalman Filter Metliod 




Let us now move on to our second method for restricting 
the posterior PDF to physical state space. It is a more in- 
volved method but gives more information. To simplify the 
discussion, consider an example where X is a li-dimensional 
real variable, and the physical region consists of the positive 
orthant Xi > 0, i — 1, . . . , d. We assume that, after incor- 
porating the measurement data, the unrestricted PDF of X is 
(approximately) given by its mean jjb and its covariance matrix 
S. We assume that the corresponding confidence region is not 
completely contained in the physical region; otherwise, there 
would be nothing to do here. 

Consider now the marginal distributions of each of the com- 
ponents XiOf X. The marginal distribution of Xi is of course 
normal, with mean pi and variance Y^u. We can then easily 
calculate the confidence interval of each Xi for given confi- 
dence levels. There will at least be one such Xi whose confi- 
dence interval will not be completely contained in the physical 
interval Xi > 0. Broadly speaking, this is the one-dimensional 
marginal version of our restriction problem. 

The first key idea of our proposal is to consider the marginal 
distribution of this Xi. If more than a fixed amount a of prob- 
ability mass of this marginal is outside the physical interval 
Xi > 0, we truncate the marginal to that interval. That means, 
we set the density equal to outside the physical interval, and 
renormalise to 1 . We then approximate this truncated normal 
distribution by an ordinary normal distribution, with appropri- 
ate mean fi and variance a^. How we will do this is described 
in the next subsection. 



1. The marginal restriction problem 

The obvious idea of using moment matching to approxi- 
mate the truncated normal is of no use here because we need 
a procedure that gives a stable result when applied twice 
or more. Approximating a truncated normal with moment 
matching does not yield a distribution with controlled tails 
(the tail being that part of the distribution outside the physi- 
cal interval). Truncating it again and approximating that trun- 
cated normal with moment matching for a second time will 
typically yield yet another set of values for mean and vari- 
ance. There is no guarantee at all that this process would con- 
verge. For that reason we seek an approximation procedure 
that controls the tail probability explicitly. 



FIG. 1: Illustration of the approximation process of Sec. lIIIFTI Nor- 
mal distribution with moments /i = — 1 and o = 1 (full line). The 
corresponding truncated normal distribution, truncated to a; > 
(dashed line). The best approximating normal distribution with 5% 
of probability mass outside the interval x >Q (dotted line). 

We will require that the approximating normal distribution 
has exactly an amount a of probability mass outside the phys- 
ical interval x > 0, where a is small, say 5%. For an illustra- 
tion of this approximation process, see Fig.[T] 

As a quality measure of the approximation, we will use 
the Kullback-Leibler distance again Dkl{p\\(i) with the trun- 
cated distribution as first argument (thus, with integration in- 
terval restricted to a; > 0) and the approximating distribution 
as second argument. Without the restriction on the approxi- 
mating distribution this would result in the moment matched 
approximation. 

Minimising this distance over all choices of approximating 
distributions amounts to maximising the following function 
over the parameters jl and a: 

/ dx (t){x; iJ, cr) log (j){x; p., a). 
Jo 

The requirement we imposed on the probability mass in the 
left tail of the approximating normal translates to the equaUty 

p,^ aa a, (34) 

where uq = V2 InvErf((Q! + l)/2). For a = 5% we find 
ao ~ 1.64485. After some algebraic manipulations one finds 
the following complicated set of formulas for the optimal a: 

a ^ (7 (^-aog + \J a^g"^ - 2cj (35a) 

c = tT- {l + t^)/2 (35b) 

9 = r-t/2 (35c) 

T = cxp(-tV2)/[V2^ Erfc(t/\/2)] (35d) 

t ~ — m/o". (35e) 

In Fig.|2] we plot the ratio ct/ct as a function of t. 

This one-dimensional solution has now to be translated 
back to the original setting of restricting the state vector X 
to the physical set. As this translation is basically an inverse 
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FIG. 2: The ratio a ja plotted as a function of t [Eq. J35H for qq ~ 
1.64485 andt > --ao. 



problem involving normal distributions only, we will again 
use a Kalman filter to solve it, as explained in the next subsec- 
tion. 



2. Backprojecting the marginal restriction using Kalman filtering 

Our second key idea is to enforce the original unrestricted 
distribution to have a marginal distribution (for component 
Xi) given by the one-dimensional approximating normal, 
with parameters jl and a. Formally, this is equivalent to per- 
forming a linear one-dimensional measurement, and we can 
incorporate its effects on the state X by a Kalman filter up- 
date step. The parameters of the measurement (measurement 
matrix H, mean z and covariance 0) will have to be such that 
the effect of the measurement is the required enforcement of 
the marginal mentioned above. 

As we only want to enforce the marginal of the Xi com- 
ponent, we will set H equal to the row vector Thus, 
Hx = Xi. This is a one-dimensional measurement, so that its 
mean z and covariance 8 will be one-dimensional too. Cor- 
respondingly, the Kalman gain is a column vector Inserting 
this into the Kalman filter update equations ( |20l ) gives 

K = T,H*{H-EH* + (d)-^ = T,\i) (Ei, + e)-i 
/x' = fi + K{z — Hfi) = fi + {z — fii)K 
S' = (]l-Ki?)S = S-K(i|S. 

The Xi marginal of the updated distribution will then have 
moments /i^ and given by 

Ml = + {z ~ ^i){i\K ^ fii + {z - fii)^ii{^ii + Qy^ 

E^, = E,,-(z|iv:E,, = [i-E,,(E,, + e)-i]s,,. 

We find the required values for the measurement parameters z 
and 8 by solving the equations /i^ = jl and = a^. 
The solution is 

e = {1/k-1)Eu (36) 

Z = - (1 - K)^J,i]/K (37) 

K = (38) 



Here, /i and a are given by the equations ( |34] | and ( l35T l, with 
fi = fii and a = ^/^u, the marginal moments of Xi in the 
original distribution. 



3. The Restriction Procedure 

In the previous subsections, we have shown how a single 
variable Xi can be restricted to its physical interval Xi > 0. 
In general, if the physical set is convex, the set is defined by a 
number of such inequalities, possibly an infinite number. For 
simplicity, we first treat the case that the physical interval is 
defined by a finite set of inequalities Xi > 0, Vi, and treat the 
more general case below. This case corresponds for example 
to diagonal quantum states, and also to the optical POVM of 
Section[Vl 

To restrict the complete state vector X to the physical 
set, we repeat the above-mentioned procedure for every com- 
ponent of X, or at least for those components for which 
Mi/fj < uq. In general, however, because of correlations, 
multiple components of X will be affected by a single step 
of the procedure, and it could very well be that the work of 
previous steps is partially undone by the current step. For ex- 
ample, forcing X2 to be positive could bring Xi back into the 
non-physical region. 

Therefore, a number of runs of the algorithm will be neces- 
sary, stopping when all marginals have their confidence inter- 
vals approximately within the physical interval. As the quan- 
tity mini Mi /'^i will converge to ao from below, a good stop- 
ping criterion is min; /ii/ui > (1 — e)ao, where e is a small 
positive number. In practice, e should not be chosen too small, 
so that the algorithm terminates in reasonable time; in our ap- 
plications we chose e — 0.003. 

The order of the iterations, namely which marginal Xi to 
treat first, does not seem to influence the end result very much. 
In one set of experiments we treated the marginals in fixed or- 
der, and in another we always chose the marginal with small- 
est Hi jai first. It is not clear that the latter order should con- 
verge faster because of the correlations between the Xi, in our 
experiments it only did marginally so. While this and other 
convergence issues are still under investigation, they appear 
not to be of major importance. 

An illustration on a small example is shown in Fig. [3] The 
left graph shows the result of the restriction of X\ to the phys- 
ical interval Xi,X2 > 0. One sees that the new distribution 
again crosses the border of the physical set, but now compo- 
nent X2 is involved, although it did not before the update. A 
second Kalman filter update step will therefore be necessary, 
on X^- The result of that second step is shown in the right 
graph of Fig. [3] 

In the case that the number of inequalities defining the 
physical set is infinite, for example for the set of quantum 
states, where the inequalities are TrpX > 0, VX > 0, the 
fixed order rule is obviously infeasible. The smallest-first rule, 
on the other hand, requires the complicated minimisation of 
X* p/ \/X*PX over all X > 0. A third, and much simpler 
possibility is to choose X at random. However, that method 
exhibits slow convergence, especially in the final stages when 
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FIG. 3: Illustration of the Kalman filter update step of section ITllF 21 The physical set consists of the positive orthant. We start with an 

unrestricted normal distribution with mean (—1, 2) and covariance S = ^ ^ g ^'^ ^ . The contour lines of the PDF are plotted in full 

lines in the upper left of both graphs. The Xi marginal of this distribution is the same one as plotted in Fig. [T] The normal distribution 
approximating the truncated marginal is reproduced here as the full line curve at the bottom of the left diagram. After applying one step of the 
Kalman filter, with parameters given by Eqs l |36l38t . we obtain a normal distribution whose PDF is plotted in the left diagram in dashed lines. 
In the graph on the right the result of a second Kalman filter update step is shown, now for component X2 ■ 



the number of unsatisfied constraints becomes small. 

A better option is to consider a combination of two rules 
in the following double iteration: the inner iteration consists 
of, given a unitary U, performing the restriction on the diag- 
onal of UpU*, that is with H = Vec{U*e"U)*, and varying 
i. This inner iteration can be performed either using the fixed 
order rule i = 1, . . . , c? or the smallest-first rule. The outer 
iteration consists of choosing a new random unitary U each 
time, until a suitable stopping criterion is satisfied, e.g. un- 
til the inner iterations achieve no further reduction of /i/u. 
Although the smallest /i/(T component does not necessarily 
occur for U diagonalising p, it is a good idea to choose the 
unitary U diagonalising the current p every now and then. 



G. Incorporating Exact Linear Constraints 

In many cases, the description of the physical state is sub- 
ject to one or more exact constraints. For quantum states the 
trace of the density matrix is 1, for trace preserving quantum 
processes the partial trace of the state representative over the 
output Hilbert space is the maximally mixed state 1 /rf, and 
for POVMs the sum of all the elements must be the identity 
matrix. Depending on the physical system, there may be fur- 
ther constraints like this. These exact linear constraints can 
be incorporated into the reconstruction process in a number of 
ways. 

A first approach is to incorporate exact constraints via 
dummy measurements with zero measurement covariance, 
and replace inverses by Moore-Penrose (MP) inverses. The 



benefit of this method is that virtually no changes to the 
Kalman filter implementation are needed. A serious drawback 
is that the state covariance matrix becomes ill-conditioned, 
since exact constraints correspond to zero variance compo- 
nents. In reality, numerical round-off causes these compo- 
nents to have non-zero variance, which makes it hard to dis- 
criminate between variances that are nominally zero and those 
that are not. This is a notorious problem for actual implemen- 
tations of Kalman filters and may cause serious numerical in- 
stabilities. Later on in the calculations, the Mahalanobis dis- 
tance {x—iJ,)*'E~^{x—fi) has to be calculated (see Sec lIIIDl) 
and even the smallest deviation inx — from the exact con- 
straints is blown up by the inverse of S. 

A second approach is to store exact constraints in two ad- 
ditional matrices, along with state mean and state covariance. 
In general, exact constraints may have an impact on the state 
but also on the measurement. For example, when the state 
is a quantum state, we have the exact constraint on the state 
Tr (0=1, and a corresponding exact constraint on the mea- 
surement probabilities J2iPi ~ 1- This implies that the dif- 
ference between any two states, e.g. the mean X and its up- 
date X', should lie in a subspace, namely the one for which 
the trace is zero. Similarly, the difference between two mea- 
surements, e.g. the actual measurement outcome z and the 
expected outcome Hfi, should also lie in a subspace, namely 
the one for which the sum of all components is zero. 

Both subspaces can be represented in calculations by two 
projectors, Tx and Tz- The projector Tx projects on the sub- 
space in state space, and Tz on the subspace in measurement 
space. The Kalman filter update equations can be made more 
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resistant to numerical inaccuracies using these projectors, en- 
suring that the exact constraints are obeyed in any iteration of 
the update process, as follows: 



Using these definitions (and a little work), the Kalman filter 
update equations can be rewritten as follows: 



y 


= Tziz-Hn) 


(39a) 


s 


= Tz{HY,H* + &)Tz 


(39b) 


K 


= i:h*s^ 


(39c) 




= Txin- Ho + Ky) + Ho 


(39d) 




= Tx{'S-KH-S)Tx 


(39e) 



Here, /xo is a reference state, e.g. the maximally mixed state 
t/d. 

Note that the ordinary inverse in the formula for the Kalman 
gain K has been replaced by the MP inverse. Likewise, the 
inverse of S appearing in the formula for the posterior PDF 
corresponding to the Kalman filter solution has to be replaced 
by an MP inverse too. 

A third approach is to parameterise the state such that the 
exact constraints are inherently satisfied. The obvious ben- 
efit is that the exact constraints do not have to be explicitly 
imposed. A second benefit is higher numerical stability, and 
straightforward invertibility of all matrices that have to be in- 
verted. 

We start again from the projectors Tx and Tz. From these 
projectors we can derive two partial isometries, Xi and Zi, 
such that the following holds: the number of columns of Xi 
and Zi must be equal to the ranks of Tx and Tz, respectively; 
XiX^ = Tx, ZiZ* = Tz; and X*Xi = 1, Z^Zi = 1. 
Numerically, these partial isometries can be calculated from 
a singular value decomposition (SVD) of the projectors. For 
example, let Tx = USU*; the partial isometry Xi is then 
obtained from the unitary U by dropping those columns that 
correspond to the zero-valued singular values. 

Roughly speaking, using these partial isometries, the matri- 
ces 0, S and H can be "cut down" to their invertible parts, 
which we will denote by a tilde. Define 



:= Z*0Zi. 



(40) 



Since the support of is exactly the support of Tz, we also 
have the reversed equality = Zi@Z*. Furthermore, is 
full rank and therefore invertible. In a similar way we define 



S :=XiSXi, 



(41) 



which is also full rank and invertible and satisfies S = 
XitX*. 

Furthermore, /j, and z live in certain affine subspaces. If /xq 
and Zq are fixed reference vectors in these affine subspaces, 
we find that /it — /ito is a vector in the support of Tx, and 
z — zq& vector in the support of Tz. Then we can define 



fi := Xl{n-no) 
z := Z*{z-Zo), 



(42) 
(43) 



and these again obey /x — fiQ + Xip, and z = zq + Ziz. 
In addition, it is possible, and best, to choose zq such that 
Zo = Hxq. Finally, we define 



K 


= XiKZl 


(45a) 


k 


:= tH*{HtH* + e)-'^ 


(45b) 




= Ho + Xi[fi + k{z- Hp,)] 


(45c) 


s' 


= Xiit- kH'E)X^ 


(45d) 




= Xii'E-'^ + H@-'^H*)-'^X^ 


(45e) 






= z^ezi, 


= Zi&Zl 


S 


= X^-EXi, S 


= XitX^ 


A 


= ^r(/^-/^o) 




z 


= ^r(z-zo) 




H 


= Z^HXi. 





It has to be stressed again that all inversions here are ordinary 
ones, not MP inverses. One sees that the equations reduce 
to the original Kalman filter update equations provided one 
always works with the "tilde" quantities. For the sake of ref- 
erence, we combine all definitions here again: 



(46a) 
(46b) 
(46c) 
(46d) 
(46e) 



All required calculations can be expressed directly in terms 
of tilde quantities. For the initial (prior) Sq, we choose bTx 
(rather than 61), which amounts to setting Sq = bl. Con- 
cerning the Mahalanobis distance, if we define x := X^x, 
we have for any vector x (as long as it is in the support of Tx ; 
if not, the Mahalanobis distance will be infinite) 

{x - /x)*Sl'(£c - n)^ix- /i)*S-i(£ - /i). 

As an example, we consider the case of A^-run pulsed mode 
state tomography. Then the constraints on the state are Tr p = 
1. Denoting the dimension of the underlying Hilbert space 
by D, this translates to ajo = |1d)/-D and Tx = l^^ — 

1 1 £) ) ( 1 I . The measurement vector z must in turn satisfy 

the constraint Y^'f-i Zi = I, with d the number of POVM 
elements of the measurement POVM, i.e. (l|z) = 1. Hence 
Tz = l,-i|l)(l|. 

The corresponding partial isometries can be found numer- 
ically using an SVD, as indicated, but for this particular case 
analytical formulas can be found. Let U be the d-dimensional 
discrete Fourier transform-kernel 



U 



(j,k = ^ exp[27ri(i - l)(fc - l)/d\, l<j,k<d. 



Let W be the matrix obtained from this by dropping the first 
column (which has constant entries). This W isa good choice 
for Zi, as can be readily checked. Similarly, for Xi we can 
choose the matrix obtained from 

5j,k, d+1 /f.i - 1 
^j,k = { Uf,k', j-l = {d+ 1)(/ - 1), 
k-l = {d + l){k' -1). 



H := ZlHXi. 



(44) by dropping the first column. 
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H. Graphical Representations of the Reconstruction 

In the previous sections we have presented a methodology 
for state reconstruction from tomographic data by which a 
Kalman filter is used to obtain a normal approximation to the 
likelihood function /x|f> where X is the state and F is the 
measurement data. The normal approximation is defined by 
its two moments: the mean state vector fi, and the co vari- 
ance matrix S. These two moments should in principle suf- 
fice as a complete statistical description of the reconstructed 
state (within the limits of the normal approximation). 

When it comes to presenting the reconstruction, however, 
there are a number of problems with the use of mean and co- 
variance matrix alone. Consider, for example, the reconstruc- 
tion of an optical POVM using our method, as discussed in 
Section[V]below. The reconstruction of the diagonal elements 
of the first element is shown in Fig.|6] The reconstructed mean 
is plotted as the centerline in the figure. On top of that, we 
would like a depiction of the covariance matrix, because this 
matrix essentially describes the reconstruction uncertainties. 



1. Depicting the covariance matrix 

The first problem one is faced with is that the covariance 
matrix S, being a matrix, cannot really be depicted in a very 
meaningful way. Nevertheless, as the whole purpose of cal- 
culating it is to provide some kind of error bars on the recon- 
struction, it is desirable to have some means of representation. 
One can do this by plotting its diagonal elements \/Y^i as er- 
ror bars on the mean value. This is meaningful because the 
diagonal element is exactly the variation of the marginal 
distribution of Xi. Of course, such a plot has to be accom- 
panied by the proviso that the plot can only be indicative, be- 
cause the variations on the elements are in general correlated. 



2. Avoiding reconstruction artifacts 

The second, and more important problem we want to ad- 
dress in this Section is the appearance of reconstruction arti- 
facts in the reconstructed mean. Closer inspection of Fig. |6] 
reveals the presence of a wave-like pattern in the center- 
line, while from theoretical considerations of the underlying 
POVM model there really is no reason why that pattern should 
be there. Such artifacts are typical for maximum-likelihood 
reconstruction methods and are well-known in image restora- 
tion [28]. Even though the wave pattern in the POVM recon- 
struction stays well within the error bars, which is already a 
clear counter-indication to its statistical significance, it would 
be better to have a reconstruction not showing such artifacts 
at all. Two methods for obtaining artifact-free solutions (or at 
least for suppressing the artifacts) are described below. 



3. MaxEnt reconstruction 

A widely used method for suppressing reconstruction arti- 
facts is the MaxEnt method, first proposed by Skilling in the 
context of image reconstruction [28]. Originally, the method 
was formulated as choosing a special prior PDF based on the 
entropy S{x) of the states (provided such an entropy exists). 
In many cases a state can be formally identified with a prob- 
ability distribution, after suitable normalisation. This is pos- 
sible whenever the state consists of a set of positive numbers. 
For digital images, the PDF is the list of intensities of each 
pixel. For quantum states, it could be the list of eigenvalues of 
the density matrix. In those cases one can assign a meaningful 
entropy functional to the state space. For quantum states, the 
von Neumann entropy is the obvious choice. 

The MaxEnt method then consists of choosing the func- 
tion exp(a5(a;)) (properly normalised), where a is a fixed 
parameter, as prior PDF. Inference then proceeds in the normal 
way, by calculating the posterior PDF and finding the maxi- 
mum likelihood solution. The upshot of this choice of prior is 
that in the absence of other information, preference is given to 
states with higher entropy. The parameter a characterises the 
amount of preference. Jaynes' principle of maximum entropy 
ll40ll could be seen as a legitimisation of this approach. 

In the context of quantum tomography, Hradil and Rehacek 
ll4lll advocated a combination of the maximum entropy 
method with the maximum likelihood (MaxLik) reconstruc- 
tion method, which they called MaxEnt assisted MaxLik 
(MEML) tomography. This method can be seen as a special 
case of Skilling's MaxEnt method. In their paper, they con- 
sidered the situation of incomplete measurements. This corre- 
sponds to a likelihood function whose covariance matrix has a 
certain number of eigenvalues that are (almost) zero, while the 
others are infinitely large. The MaxLik reconstruction is thus 
known with certainty to lie in a certain subspace, but its po- 
sition within that subspace is completely unknown. In other 
words, there exists not a single state maximising the likeli- 
hood function, but a whole plateau of states. The proposal of 
[41] consists of finding the point on that plateau (i.e. in the 
MaxLik subspace) for which the entropy S{p) = —Trp log p 
is maximised and take that point as the reconstruction. 

From an experimental viewpoint, the situation considered 
in Ref. [|4lll . of variances that are either zero or infinite, is an 
idealised one. In practical experiments, the number of mea- 
surements is finite, so that even the most precisely known state 
components have a non-negligible variance. Second, there 
may be practical and/or technical limitations on the kind of 
measurements that can be performed, so that some variances 
may be very large, but still finite. In Sec.|V]we will see a clear 
example of this. In that section the reconstruction of an op- 
tical POVM is described. While the elements of this POVM 
are diagonal in the Fock basis, its tomography is based on 
coherent states rather than Fock states, because the latter are 
extremely hard to produce. This causes large variances on 
the reconstructed elements without a clear-cut distinction be- 
tween perfectly known and completely unknown components. 
When dealing with such realistic experiments, the full-blown 
MaxEnt method is much more preferable. 
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In its original formulation as a choice of prior, the MaxEnt 
method has a number of shortcomings. One is that there ap- 
pears to be no satisfactory and rigorous way of choosing the 
parameter a. Secondly, the principle of maximum entropy 
does not necessarily apply to the entropy of the states. In 
quantum tomography we are dealing with a controlled system; 
the system is being prepared in a predefined quantum state, to 
the best of the preparer's abilities, and the tomography acts 
on a sequence of independent identically prepared systems. 
In thermodynamical terms, this corresponds to a system that 
could be as far from equilibrium as the preparer wants it to 
be. This has to be contrasted with Jaynes' MaxEnt principle, 
which has been inspired by the statistical mechanics of sys- 
tems in near-equilibrium, and which is based on the argument 
that the probability of a macro-state should be proportional 
to the number of microstates consistent with it, i.e. is pro- 
portional to its thermodynamic entropy. For systems close to 
equilibrium, we agree that it makes sense to choose a prior 
distribution that assigns more weight to states with higher en- 
tropy. For controlled systems, and for those systems lacking 
a fundamental notion of entropy, we are more tempted to opt 
for a uniform distribution, as we have done in this work, and 
incorporate the maximum entropy idea as a regularisation, as 
explained below. 



4. Regularisation 

Rather than apply the MaxEnt principle, which we deem 
not always appropriate, one can adopt a more pragmatic ap- 
proach in which the entropy functional is no longer fundamen- 
tal and can be replaced by other functionals. And rather than 
replace the prior PDF with the chosen functional, which im- 
plicitly changes the final posterior PDF, and choose the maxi- 
mum likelihood solution for that changed posterior, the regu- 
larisation method does the following (Ref. |28], Sec. 6.2): the 
prior PDF is unchanged, and within the confidence region of 
the resulting posterior PDF (unchanged as well) 

it finds the solution that maximises the chosen functional. 
When expressing the functional as a cost, or penalty function, 
this would be a minimisation. 

Since the entropy is a concave functional, maximising it 
over a convex set (such as the confidence region) is a convex 
problem and can be efficiently solved numerically. Likewise, 
minimising a cost function again gives a convex problem pro- 
vided the cost function is convex. Proper distance measures, 
for example, would therefore be good cost functions. 

Which cost function to use really depends on the prob- 
lem setting. In the example of the optical POVM men- 
tioned above, theoretical considerations suggested [19,1 that 
the smoothness of the POVM elements, defined as 

Q({n«}.) = f:xi(nS-nf )' (^7) 

fe=l i=l 



could be appropriate. In fact, this smoothness is a commonly 
used reg ularisation functional in image reconstruction meth- 
ods 1I28I1 . It is immediately clear that this Q is a convex func- 
tional, as required. The appropriateness of this cost function 
comes from the fact that it penalises the 'wavyness' of the 
centerline, as exemplified in Fig. |6l 

When the cost function is quadratic, like this smoothness 
term, the minimisation problem is a quadratically constrained 
quadratic programming (QCQP) problem. Such problems can 
be efficiently solved using semi-definite programming (SDP) 
solvers |42]. For the sake of definiteness, let us consider 
the case where the states are quantum states (p > and 
Trp — 1). The general form of a quadratic cost function 
can then be written in terms of a matrix A and a vector b as 
{Ap - b)*{Ap - b). The SDP form of the QCQP problem is 
then: minimise the (slack) variable t over all t and p under the 
combined quadratic and semi-definite constraints 

P > 

Tip = 1 

{Ap-b)*{Ap-b) < t 

(p-/x)*St(p-/x) < Mil,, 

This problem can be solved in a straightforward way by SDP 
solvers like Sedumi L39J . 



IV. APPLICATION I: STATE RECONSTRUCTION OF AN 
ENTANGLED 2-QUBIT STATE 

The methods introduced in this paper have all been tested 
on real sets of tomographic data. In this Section and the next 
we report on two such applications, one in state tomography 
and one in POVM tomography. 

In the present Section, we consider the reconstruction of 
tomography data of a source of polarisation-entangled pho- 
ton pairs, obtained by Langford et al ll43ll and compare our 
results to their reconstruction. The source is a BBO-crystal 
down-conversion source operating in CW mode, pumped by 
an Argon laser. Two sets of tomography data were taken, 
one directly on the crystal, and one on the single mode fibres 
(SMF) attached to the crystal. In both cases, the sequence of 
measurements is as given in Tab. U This measurement basis 
is over-complete because not all measurements are needed to 
obtain a full state reconstruction. Nevertheless, it was argued 
that by taking an over-complete basis a more accurate recon- 
struction could be obtained. 

A nice consequence of this choice for our reconstruction 
method is that the projectors of the 36 basis states add up 
to a multiple of the identit y, J^lt i IV'^''^) (V''''^ I =91. As 
has been discussed in Sec. IIIIB5I this allows us to consider 
these projectors as if they were POVM elements of one big 
over-complete POVM with normalisation factor M = 9. We 
can thus take all click frequencies and put them in one 36- 
dimensional vector /. Similarly, we have a 36-dimensional 
vector of probabilities p such that p/M is a genuine (nor- 
malised) probability vector. As the measurements are ob- 
tained in CW mode, the frequencies are Poissonian and after 
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1 HH 


13 DH 


25 RH 


2 HV 


14 i3V 


26 


3 VH 


15 AH 


27 


4 VV 


16 ^1/ 


28 LV 


5 HD 


17 Di3 


29 i?D 


6 


18 DA 


30 RA 


7 


19 


31 LD 


8 VA 


20 AA 


32 Lv4 


9 Hi? 


21 Di? 


33 RR 


10 HL 


22 DL 


34 i?L 


11 l^i? 


23 AR 


35 LR 


12 l^L 


24 


36 LL 



TABLE I: Sequence of measurements in the state tomography of 
the BBO source of Application 1 (Section HVt . The labels H, V, 
D, A, R, L refer to the polarisation basis states: = (1, 0), = 
(0,1), D = (l,l)/\/2, A = (l,-l)/%/2, R = (l,i)/^and 
L = (1, -i)/^/2. 

Bayesian inversion (/ — > p) we find that p/M is Dirichlet 
distributed with parameters / and N — J2k=i fk- 

The upshot of all this is that the Kalman update equa- 
tions have to be executed exactly once, with z given by 
z = A/ /i(Dirichlet(/)) and © by AP times the covariance 
matrix of Dirichlet(/). This is particularly convenient, be- 
cause the issue of setting an initial prior and removing it again 
after the Kalman updates (see Section IIIICI ) can be resolved 
analytically, which allows us to choose an infinitely wide ini- 
tial prior b = oo without getting into numerical trouble. With 
such a prior, the Kalman update yields the following posterior, 
as can be checked with a modest amount of work: 
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0.025 - 




-0.05 -0.04 -0.03 -0.02 -0.01 

FIG. 4: A display of the reconstruction results for Application 1, 
Section ITVl showing a slice through state space illustrating the po- 
sition of the two-photon state, reconstructed from the data obtained 
by measuring directly at the BBO crystal. What is shown is the pro- 
jection of the state on the 2D subspace spanned by two well-chosen 
pure state projectors. The two concentric ellipses centered about the 
reconstruction mean /x are the projections of the 50% (inner ellipse) 
and 95% confidence regions (outer ellipse), respectively; these el- 
lipses are quite close together due to the rather high dimension of 
the system (d = 15). The intersection of the physical set with the 
subspace is the triangle xi > 0, X2 > 0, xi + X2 < 1, of which 
the lower left corner is shown. The projection of the MaxLik solu- 
tion pML is also shown. This solution is well within the confidence 
region, as should be. For comparison purposes we have also plotted 
the projection of the "naive" least-squares reconstruction pN- 
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(48) 
(49) 



Note that these formulas are stated in terms of the "tilde quan- 
tities" [see Sec. IIIIGI Eq. ^^]. Both the state and the fre- 
quencies satisfy exact constraints, Tr p = 1, and X]fc=i fk = 
N, and we have chosen to deal with these constraints in the 
numerically most stable way, by "cutting off" the kernels 
(zero eigenvalues) of the respective operators. In the deriva- 
tion of the above formulas care has to be taken because the 
product HH* is not full rank. 

We show the results of the tomographic reconstructions of 
the measurements at the crystal and at the SMF in Figs.|4]and 
|5j respectively. Obviously, we cannot show the confidence 
regions in full 16-dimensional space, and we have chosen a 
2D subspace spanned by two pure state projectors. We take 
the two eigenvectors tp and <j> of the reconstructed mean state 
fi that correspond to the 2 smallest eigenvalues (one of them 
being negative). The parameters xi and X2 are then given by 
the mappings p xi = {iplpli/;) and X2 = {(t)\p\<j)). 

For both cases we calculate the least-squares solution and 
the MaxLik solution. The least-squares solution pat is the 
state Pat = 

EjCj#'^'')(V''^''|/EjCj' where the coeffi- 
cients Cj are the least-squares solutions of the system fi ~ 

(V'^'^l = E,Cil(^«|^(^')p. The 

MaxLik solution is the physical state pml for which the Ma- 
halanobis distance from the reconstructed mean state is min- 



imal. We have implemented this in Sedumi, as indicated in 
SeclmE] 

In fi^'l, the MaxLik solution was calculated in a different 
way, through the minimisation of a penalty function 



n(p) E 



[h-Apk{p)Y 
[Apk{p)Y 



Here A is the unknown brightness factor of the experiment. 
This MaxLik solution closely matches the MaxLik solution 
obtained through our KF method. To obtain a quantification of 
the accuracy of the MaxLik solution, Langford used a Monte 
Carlo calculation to estimate the mean value of Ii{p ml ) when 
fk is considered as a Poissonian random variable with mean 
Apk{pML)- From this mean value, a fit quality parameter Q 
is obtained by dividing the mean value by the total number of 
measurements and taking the square root. Ideally, the mean 
value of Q should be 1 . 

Compared to the full error bars of the KF method, the Q 
quantity conveys little information about the statistical errors 
and it is not clear what the acceptable values of Q should be. 
Moreover, the Monte Carlo calculation needed to find Q is 
several orders of magnitude slower than the KF algorithm. 
Langford reports MC running times of about 150 seconds for 
200 MC iterations. In contrast, our KF algorithm runs in 0. 12 
sec (about 1000 times faster), while at the same time offering 
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FIG. 5: As Fig. ID but with the state measured at the SMF outputs. 
Because the duration of the tomography run was twice as long as in 
the case of Fig.|4l the confidence region is much smaller. The main 
difference, however, is that the confidence region lies deep into non- 
physical space, meaning that the MaxLik solution is far outside the 
confidence region. This is not a deficiency of the KF reconstruction 
method, nor of its implementation, but is actually a feature. It is a 
diagnostic feature that indicates that something is wrong with the as- 
sumptions about the underlying noise model. A likely possibility is 
that the measurements are subject to additional fluctuations. Accord- 
ing to (43I1 the most likely source is temperature-dependency of the 
spatial alignment of the SMFs, which caused the measurements to 
drift. To get a proper reconstruction this drift should be taken into 
account in the noise model as an additional term. 

much more error information, with a clear statistical interpre- 
tation. 



V. APPLICATION 2: RECONSTRUCTION OF AN 
OPTICAL POVM 

Following a proposal of Ref. ll44l] . in Ref. ||45|] an exper- 
imental realisation was reported of an optical detector with 
photon-number resolving capabilities. The basic idea is to 
carve up an optical pulse into 8 portions and detect the pres- 
ence of photons in each of these portions. More precisely, 
this setup simulates a cascade of beam splitters and eight 
avalanche photo-detectors (APDs), with the probability of a 
photon arriving at a certain APD being roughly 1 in 8. The 
number of detectors clicking therefore gives an indication 
of the photon numbers in the pulse. The detector is imple- 
mented using two Franson interferometers, an additional bal- 
anced beam splitter, two avalanche photo-detectors, and two 
identical circuits for performing time binning. 

The behaviour of this composite detector can be described 
by a 9-element POVM, where each of the outcomes corre- 
sponds to the number of APD's clicking (from to 8). We de- 
note the POVM elements by {n^'''^}|^Q, where the elements 
nC^) are positive semi-definite and add up to the identity ma- 
trix. In principle, the elements are infinite dimensional (cor- 
responding to photon numbers being unbounded), but we will 



truncate them at a certain dimension d (in our calculations we 
have chosen values of d of up to 170). Since this detector has 
no phase reference, it is insensitive to phase, which means that 
the POVM elements have to be diagonal in the Fock basis. 

To obtain a precise characterisation of the POVM elements, 
a tomography experiment has been performed |19] by which 
a large number of pulses consisting of coherent states \a) of 
ever increasing power (cx |ap) were sent to the composite de- 
tector and the resulting numbers of detectors clicking were 
recorded. The parameter a was sweeped from 0.4 to 11, in 
steps of about 0.01, and for each value of a, = 38084 mea- 
surements were taken. Per value of a the measurement record 
consisted of the number of pulses fk that caused k detectors 
to click, for fc = 0, . . . , 8; obviously, X]fe=o fk = 

Using these data, a reconstruction of the POVM elements 
(without error bars) was obtained and presented in |19!l. Here 
we take the same data and perform a reconstruction based on 
the KF method, yielding a maximally likely solution with in 
addition a definite confidence region. To avoid any confusion, 
we stress that the object under scrutiny is a POVM and the 
measurement is made using prepared quantum states. In other 
words: the POVMs are states and the state is a POVM. 

We have calculated the (unphysical) mean value /i and co- 
variance matrix E using Kalman filtering, including T pro- 
jectors for including the exact constraints that the POVM el- 
ements must add up to the identity matrix. Then we applied 
the KF method for restricting to the physical set, giving phys- 
ical mean value and covariance matrix. Finally, we calculated 
the maximally smooth solution within the physical confidence 
region. 

In Fig. |6] we depict the final results for each of the POVM 
elements, showing the physical mean value solution, the error 
bars, and the smoothed solution. The smoothed solution of 
all POVM elements together is depicted separately in Fig. Q 
The results are in very good correspondence with both the re- 
construction of |fl9ll and the theoretical model of the POVM 
(based on independent measurements of the reflectivities of 
the beam splitters and the overall photon loss). 

To illustrate how the mean values and error bars change af- 
ter each KF iteration, we have created a movie, where each 
frame consists of a plot similar to the one of Fig.|6] generated 
after each iteration. We refer the reader to |48] for this anima- 
tion, the MatLab routines used, and other related material. 

In order to infer how many measurements are needed to 
reduce the errors, one has to look at the unconstrained confi- 
dence region. We have plotted the spectrum of the unphysical 
covariance matrix in Fig. [8] This graph allows to estimate the 
number of experimental runs N necessary to achieve a cer- 
tain final precision. It is evident from the graph that only 110 
of the 800 free components have standard deviation less than 
0.001 (\i = af < 10"^). Since variances scale as 1/iV, 
to double that number to 220, say, N should be increased 
by a factor of no less than about 100,000 (to get the (T|gQ of 
10^^ below 10~^), i.e. from 38,000 to the rather impractical 
3,800,000,000. Hence, to really achieve higher precision with 
this kind of experiment, another setup should be considered. 
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FIG. 6: (Colour online) Reconstruction of the POVM elements of the Optical POVM of Sec.|Vl The graph depicts the mean value solution 
fj,i of the diagonal entries (central wavy line (blue)) and their marginal standard deviations \/Si7 (±lcr error bars). As the actual variations on 
the diagonal entries are correlated, this plot can only give an indication. Along with the mean value solution, the regularised solution is plotted 
(central smooth line (red)). 
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FIG. 7: Reconstruction of the POVM elements 11^*'^ of the Optical 
POVM of Sec. |V] (up to photon number 30 only). This is the reg- 
ularised solution, i.e. the solution with maximal smoothness that is 
still within the confidence region obtained from the Kalman Filter 
method. The solution obtained is in complete agreement with the so- 
lution from Iil9i1 , which was obtained in a completely different way. 
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FIG. 8: Spectrum of the corrected covariance matrix Econ for the 
Optical POVM, before restricting to the physical set; the eigenvalues 
are plotted on a logarithmic scale. The values saturate at about 10^", 
due to numerical imprecision in the calculation of Econ . Eigenvalues 
800 and higher correspond to the exact constraints imposed by the 
projector Tx and are nominally 0. 



VI. DISCUSSION 

A. Comparison to other Methods 

The reconstruction method that matches ours most closely 
is the one reported in which is also based on the likeli- 
hood function and also yields a covariance matrix. Hence, this 
method allows to calculate confidence regions in the same way 
as ours. The main differences are that in |5] the point of max- 
imum likelihood is calculated first, the covariance matrix is 



calculated as the inverse of the Hessian (the second derivative 
matrix / dxidxj) of the logarithm of the likelihood func- 
tion, taken in the mode of that function, and the restriction 
to the physical set is imposed beforehand. In contrast, our 
method amounts to calculating the mean of the log-likelihood 
function, its Hessian in that mean, and the restriction to the 
physical set is made afterwards. 

First of all, we believe that our approach yields results 
that better match the exact confidence region. The likelihood 
function is highly skewed, whenever there are a lot of low- 
probability measurement outcomes; this appears to be the rule 
rather than the exception. In those cases the mean is statis- 
tically more meaningful than the mode, especially when one 
restricts to the mode over the physical set from the outset. Sec- 
ond, restricting to the physical set only in a post-processing 
phase yields valuable diagnostic information about correct- 
ness of the assumed noise model and also about the ultimate 
accuracy allowed by the particular tomographic data. As illus- 
trated in Application 1, the mode over the physical set can be 
really far from the mean, due to unforeseen noise/error con- 
tributions, and the mean has to be calculated in order to see 
that. Also, to infer how many more measurements would be 
needed to improve the reconstruction accuracy, one needs to 
look at the covariance matrix before restricting to the physical 
set, as illustrated in Application 2. 

Other reconstruction methods also calculate the MaxLik 
solution and derive eiTor measures from Monte Carlo sim- 
ulations. As such, they suffer from the same drawbacks as 
the method of fs*] in that the restriction to physical space is 
made from the outset. Moreover, the time required for the 
Monte Carlo calculations rapidly becomes prohibitive with in- 
creasing system dimensions. Even for two-qubit systems, our 
method is orders of magnitude faster than Monte Carlo meth- 
ods. A further problem with the Monte Carlo method is the 
difficulty of obtaining a reliable stopping criterion. 



B. Computational Resources 

The memory requirements of our method are easily calcu- 
lated. They are essentially governed by the dimension of the 
subspace Sx on which the state p is supported. If this dimen- 
sion is D then storage for jl consists of complex numbers, 
while for the (tilde) covariance matrix it is the square of that, 
W^. This means that for the full reconstruction of rt-qubit 
states, 2^" elements are needed for the state, and 2**" for the 
covariance matrix. 

The computation time for the Kalman filter update (exe- 
cuted once per measurement setting) is dominated by a fixed 
number of matrix multiplications (of x D'^ matrices) and 
one matrix inversion (of ?l K y. K matrix, where K is the 
number of outcomes per measurement setting and therefore is 
typically much smaller than D). As the computation complex- 
ity of a matrix multiplication for two k x k matrices is 0{k^) 
(or somewhat less), we get a computational complexity of D^. 

The optional post-processing steps of calculating the Max- 
Lik and/or MaxEnt solution require solving a semi-definite 
program. In all reported applications this turned out to be the 
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most time-consuming step. 
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VII. CONCLUSION 



In this work we have introduced a novel Bayesian tomo- 
graphic reconstruction method based on Kalman filtering that 
does not just give a maximum likelihood solution but also pro- 
duces error bars, in the form of a confidence region around a 
mean value solution. It must be stressed that the error bars are 
directly derived from the measurement data, unlike in Monte- 
Carlo methods, where they are produced from simulations. 

We have shown that to properly deal with low-probability 
events (e.g. measurement outcomes with very few clicks) one 
has to consider the conjugate distribution of the noise model, 
in the spirit of Laplace's rule of succession. That is, if click 
frequencies are distributed multinomially or Poissonian, this 
yields a distribution of the underlying click probabilities that 
is Dirichlet distributed. This avoids the incorrect assignment 
of zero probability to an outcome that has not been observed. 
Furthermore, we have introduced a novel method of ensuring 
that the reconstruction is physical. This method is again based 
on Kalman filtering, and has the benefit that it is very fast and 
again produces appropriate error bars. 

Finally, we have applied the method to two real world ap- 
plications. In the first example, the state reconstruction of an 
entangled two-qubit state, the reconstruction process reduces 
to a single application of the Kalman update equations which, 
apart from its numerical stability, reduces the computational 
effort. Compared to Monte Carlo methods for calculating 
error bars the computational effort is reduced by several or- 
ders of magnitude. The Kalman filter method also revealed 
the necessity to adjust the underlying noise model by taking 
into account additional error sources. The second example 
concerned the reconstruction of an optical POVM. There the 
advantages of Kalman filtering also became evident in one's 
capability to estimate the number of experimental runs neces- 
sary to achieve a certain final precision. Both examples indi- 
cate that our KF method can be an invaluable diagnostic tool. 

In future work we will consider how to deal with mea- 
surement imperfections, including drift in the tomographic 
and system components. We will investigate how the present 
method can be applied to tomography with continuous vari- 
able outcomes. A further topic of study will be the integration 
of the Kalman filter method within adaptive tomographic se- 
tups, as the method is very much an online method, updating 
the covariance matrix as it goes. Among the more technical 
issues, we will study the convergence properties of our pro- 
posed Kalman filter method for restricting the reconstruction 
to physical space. 

We are confident that our reconstruction method, due to its 
statistically well-founded nature, can be the basis of a depend- 
able, easily adaptable, and universal reconstruction algorithm. 
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APPENDIX A: PROOF OF THE BOUND ON THE 
PHYSICAL CONFIDENCE VALUE 



For definitions we refer back to Sec. IIIIDI We start with a 
Lemma. 

Lemma 1 Define the function 



Then for a > 0, and r < R the relation 



g{R,a) - giR,Q) 



holds. 



Proof. Consider three integrable functions on the interval 
[0, R], /, g, and h. Let / be non-negative, and g and h non- 
increasing. It is easily shown that these functions satisfy the 
inequality 



dxf{x)g{x)h{x) / da;/(a;) 
Jo 

pR. pR 

> / dxfix)gix) / dxf{x)hix). (Al) 
Jo Jo 



To see this, subtract the right-hand side from the left-hand 
side, rewrite the integrals as double integrals over the square 
[0, R] X [0, R], split up this square into two equal parts along 
the diagonal x = y, and enjoy the benefits of the integrand's 
symmetry, giving: 



dx r dyf{x)f{y)g{x){h{x) - h{y)) 
Jo 

dx r dyf{x)f{y)(g{x) {h{x) - h{y)) 

Jx ^ 

+g{y) {h{y) - h{x)) 
" dx r dyf{x)f{y) {g{y) - g{x)) {h{y) - h{x)) 

I J X 

> 0. 
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Now specialising the inequality ( lAll i to the functions 



h{x) = $(0 < a; < r) 



gives the inequahty of the lemma. 



□ 



We start from the unphysical reconstruction, that is the 
mean ^ and the covariance matrix S. Let V be the physi- 
cal set, and let po be the maximum likelihood solution, i.e. the 
state that is closest to the mean /i, in the Mahalanobis distance. 
In what follows we will use the Hilbert space representation 
of states, i.e. a representation as vectors. As before, we will 
denote this by math boldface. The discussion becomes eas- 
ier by going over to a new, "standardised" coordinate system, 
in which the mean /i is the origin and the covariance matrix 
is the identity matrix. The Mahalanobis distance is then just 
the Euclidean distance, and the confidence regions are spheres 
centered around the origin. 

In quantum mechanics, the physical set V is convex. By 
definition, po is on the boundary of V. Therefore, V can 
be decomposed into infinitesimal cones with center Pq, each 
pointing to a different direction fJ, having cross-section dO, 
and cut to certain length i?(r2), where the latter function de- 
termines the overall shape of V. 

In standardised coordinates, the unphysical posterior / is 
given by f{x) — Cexp(— a;^/2), with C the normalisation 
constant, and x = | |a;| |. We now want to calculate the cumu- 
lative distribution function (CDF) of the physical posterior, 
which is the normalised integral of / over the intersection of 
V with the ball of radius x, g{x)/ g{oo), with 

r fRm 

g{x) := dn drr'^-^<^{\\rn + paW <x) 

Jn Jo 

xexp(-||rO + po||V2). 
Let us also define the non-negative function 



gix.n) 



Then we have 



/ drr'^-^^{\\rn + po\\<x) 
Jo 

xexp(-||rn + po||V2)- 

J^dng{x,n) 



■){x)/g{oo) 



dng{^,n) 



dn 



g{oo,n) gix,n) 



^ dn'g{oo,n') g{oo,n) 



The first factor of the integrand, which we will denote by 
w{n), is a PDF, in that it is a non-negative function integrat- 
ing to 1 over n. We have thus shown the following statement: 
Statement C: The function g{x) / g{oo) is a weighted average 
of the functions g{x, Jl) / g{oo, Jl) over il. 

Let us now fix ft. The value of g{x, Jl) no longer changes 
for X beyond i?(r2). We define as that value of r for which 



I \rn + Pol I ~ X. Thus, for Rx > we have g{x, Jl) = 

g{oo,n). 

Consider now the case that x is small enough so that R^ < 
R{n). Let p = I Ipo 1 1 (the 2-norm of the vector representation 
of Po)- In fact, p — Mml as used in the bound ( |32] |. Let 6 be 
the angle between a normal to pa — p. and Jl. Because po is 
the nearest point in V to p, this angle is between and 7r/2. 

In this case we have 

= drr'^"iexp[-C(r)V2], 
Jo 

with 

- \\rn + po\f 

= p^ +r'^ + 2prsm6 

= {r + p sin 6*)^ + cos^ 9. 

This gives us 

gix,n) = exp{-{p^cos^e)/2) 



X / drr'^~^exp(-(r + psin0)2/2). 
/o 



The factor in front of the integral is independent of x and can- 
cels out in the quantity of interest g{x, r2)/g(oo, SI). Apply- 
ing the lemma we now get 

gix,n) _ J^^^ drr'^-icxp(-(r + psin6>)V2) 
g{oo,n) j«W drr'i-iexp(-(r + p sin 61)2/2) 



> 



> 



/p'^ drr'^-icxp(-rV2) 
/jf^"^ drr'^-iexp(-r2/2) 
drr'*-icxp(-rV2) 
dr r''~i exp(— r2/2) 



Now Rx satisfies the triangle inequality: 

X < p + Rx. 

Thus if we replace Rx as upper integration limit by its lower 
bound a: — p, or if the difference is negative, then we get a 
lower bound on the integral too, giving 



5(00, n) 



> 



/o" " drr'^-iexp(-rV2) 
dr r'^"! exp(— r2/2) 



The upshot of this step is that the right-hand side is now com- 
pletely independent of fl, which allows us to invoke State- 
ment C and get that g{x)/g{oo) satisfies the same inequality: 



9{x) ^ Jo 



drr° ^ exp(--r2/2) 



5(00) 



drr'^ iexp(— r2/2) 



The right-hand side is the CDF of the chi distribution (with 
d — 1 degrees of freedom) evaluated in x — p, i.e. the CDF is 
shifted to the right by an amount p = A4ml- Its confidence 
region is therefore the interval [0, McR,unphvs + Mail]- 
The left-hand side is the CDF of the restricted posterior, with 
confidence region [0, AicR.phys]- Because of the inequality, 
the latter confidence region is contained in the former. That 
proves the bound (l32l i. □ 
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APPENDIX B: PROPERTIES OF THE DIRICHLET 
ESTIMATOR 

1. Mode V Confidence Region 

Here we give the promised proof that the mode of the 
Dirichlet distribution lies within the confidence region as de- 
fined in (|29] |. with fi and S given by ^ and (|7]i. 

Proof. Let x be the mode of the Dirichlet distribution, x = 
f /N, and fihe its mean, /x = {f + 1)/ {N + d). Thenx — fi — 
{df - N)/{N{N + d)); as the sum of the entries of x - fi 
is 0, X — fi lies in the subspace on which G of ^ projects. 
Thus we have 

= {x- (i)*T,1{x- (i) 

^ iV + d + 1 A (df, - 
N^N + d)'^^ + l 

If r is the number of non-zero components of / (thus 1 < r < 
d) and if we put fi = Npi, fixing p,;, then this expression can 
be expanded as d — r + 0(l/iV). The term X]f=i(/j + 1)^^ is 
maximal for / = {N, 0, . . . , 0), giving the sum [N + 1)^^ + 
d— 1. In this way we get the upper bound 

< ^f^V id -l) = d-l + 0[llN). 

For not too small values of N, this bound is approximately 
equal to d — 1, which is also the number of degrees of freedom 
V in this case. As d — 1 is the mean value of the Xd-i distri- 
bution, the value d — 1 lies within any reasonable confidence 



interval. Therefore, the mode of the Dirichlet distribution lies 
within the confidence region of its normal approximation. □ 



2. Wald statistic 

Suppose the actual state under consideration is p, and a 
measurement is made using a d-outcome POVM, so that the 
probabilities of the outcomes are given by the probability vec- 
tor p. In an experiment this gives rise to certain outcome fre- 
quencies /, drawn from a multinomial distribution with pa- 
rameters N and p. From these frequencies / one can derive 
an estimation P of p, Dirichlet distributed with parameter / 
according to the prescription of Sec. lIIIBl Let fi and S be the 
moments of this Dirichlet estimation. 

We want to study how well the actual p fits within the con- 
fidence region obtained from this estimation. To do so, we 
construct the Wald statistic 

z := {p - n)*i:\p - n) 

jv + d + 1 A ((iv + d>, -CA + i))^ 

N + d f, + l 

= (N + d+l) (^iN + d)J2j^-^y 

If the distribution involved was Gaussian, this statistic would 
be Xd-i distributed. In reality, the distribution only tends to 
a Gaussian and the Wald statistic is only asymptotically Xd-i 
[46]. 

An exact calculation yields the first two moments of z, 
given that F is distributed as ^ Mtn(A^, p), in terms of 
p: 



[N+l] 



Y^p.p, [1 + (1 -p, - {l-P^r^' - (1 



[N + d+l) 

{N + d+lf{N + df \ N +1 



(Bl) 



7V + 2 



+ (iV + l)^g(K,7V)- [l-^p,,(l- 



^Af+l 



(B2) 



where the function g(p, N) is defined as The sum g{p, N) is related to the first inverse moment of 

the positive (i.e. non-zero) binomial distribution 

N+l 



fc=i ^ ^ 



by 

5(p, 7V)=pVi(p,^+1)- 
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FIG. 9: Graph of ii{Z) (lower curve) and (t(Z) (upper curve), from 
Eq. dBlt and the square root of JB2b . as function of p, for d = 2 and 
iV = 100. 



No closed form for inverse moments exists, but several expan- 
sions are known (see, e.g. Ref. |47f] and references therein). 
For large N, one can approximate the binomial distribution 
by a Poisson distribution with mean /i = Np. For the first in- 
verse moment, this gives an approximation by the known first 
inverse moment of the Poisson distribution, with relative error 
of the order 1 /TV: 

N) « fiNp), /(/i) = e-''(Ei(M) - log - 7); 

here, Ei(a;) is the exponential integral function and 7 is the 
Euler-Mascheroni constant. Thus, we get 

g{p, N)^{N + ^3e-^(Ei(/i) ^logfi- 7) 

with fj, = {N + l)p. To obtain o'^(Z), however, g has to be 
multiplied by a constant of order O(N^), and as (t'^{Z) turns 



out to be of order 0(1), we need to know g with a relative 
precision of order 0{1/N^). This requires correction terms of 
i of up to second order. According to the recipe described 
in 14711 . the required approximate formula for is given by 



24(iV+ 1)2 



with fi = (N + l)p and 



f{^l) = e-nEi(M)-logM-7) 

A{fi) ^ 3//-8Ai^-12(iV+l)/i2 + 24(iV+l)2 

B{fi) = 12fi^ -6fi^ -24:{N + 1)^- {12N + 10) 

C{^i) = + + {UN +U)fi+ {UN +10). 



For the 2-dimensional case, with N = 100, the resulting 
values for ij,{Z) and a{Z) are plotted as function of p in 
Fig. |9l When p is sufficiently far removed from the endpoints 
or 1, one sees that /i and a converge to their values 1 
(/i^2 = d - 1) and V2 (cr^2 = ^^2(^-1)). 

More generally, good convergence occurs when the small- 
est Pi is still larger than about 20/N, i.e. when every out- 
come has at least 20 clicks. Numerical studies reveal that 
the highest value of cr occurs roughly when the smallest pi 
is about 7.2/ N. In turn, this highest value of a is maxi- 
mal when all pi bar one are equal to 7.2/ N. This worst 
case value is approximately given by the empirical formula 
1.285(1 + 2(d-3/2)/iV)CT^2. 

This gives us the following conservative approach: Take the 
chi-square value for a{Z) whenever the smallest pi is larger 
than 20/ N, and 1.285(1 + 2(d-3/2) /N) times the chi-square 
value otherwise. 
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